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PREFACE 


The first edition of this book was issued in 1939, and a second edition 
appeared in 1946. Although some minor alterations were made at that time, 
and the text was clarified in places, the scope and character of the book 
remained essentially unchanged. It dealt with descriptive statistics and 
relegated almost all considerations of sampling and statistical inference to 
Part Two, where a mathematically more adequate treatment was possible. 

Within comparatively recent years a great change has come over the teach- 
ing of elementary statistics. Whereas formerly the emphasis was on the 
technique of organizing and classifying the data supplied by observation or 
experiment so as to expose their essential characteristics, the tendency now is 
to stress the limitations of statistical inference and the uncertain nature of 
conclusions from observational data. Ever since W. S, Gosset, in 1908, gave 
a method of estimating the error in the mean of a small sample, it has indeed 
been known that the traditional statistical methods were inadequate, but the 
mathematical difficulties of any rigorous treatment of statistical inference 
kept this topic out of the elementary classroom. P"or some time it was 
thought that a sound treatment could not be given without using advanced 
mathematics, and it is still not easy to do so, but latterly it has been found 
possible to develop a satisfactory exposition of the principal techniques of 
inference, suitable to the student with a very limited mathematical back- 
ground. The important tiling is to know what assumptions are made in the 
mathematical treatment and to be able to judge whether these assumptions 
can reasonably be regarded as approximately fulfilled in the data under test. 

The present edition (the third) represents a radical revision and extension 
of the text, in line with the ideas stated in the previous paragraph. There is 
still a need for descriptive statistics, and the first eight chapters deal with topics 
which may be included under this general heading. These chapters require 
little mathematics beyond elementary algebra. The notation of the definite 
integral has been used for convenience in Chapter VIII (on the Normal Law), 
but the simple interpretation as an area is all that is necessary. 

Some notions of probability are essential for the understanding of statistical 
inference, and Chapter IX gives a treatment based on the rules of combination 
for relative frequencies. Prior knowledge of the elementary algebra of per- 
mutations and combinations is not assumed. The sections on continuouG 
probability distributions use very simple notions of calculus (the integration 
of a-** for positive integral values of n), but these sections are not essential for 
later topics in this book. 
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Chapters X to XIIT introduce the ideas of significance, of confidence levels, 
and of the testing of h 3 potheses, as applied to several important types of 
statistic such as the proportion of individuals in a sample having a certain 
characteristic, the sample mean for a measured variable, the diffcr(mcc of two 
sample means, the sample variance, and the ratio of two variances. Non- 
parametric tests (of goodness of fit and randomness) arc discussed in Chap- 
ter XIII, as well as some small-sample tests for order statistics. 

Chapters XIV to XVI, on time series, regression, and correlation, have been 
considerably expanded from the treatment in th(‘ earlier editions, to include 
tests of significance for regression and correlation coefficients, the use of 
chi-square for contingency tables, and Fisher's method for 2 X 2 tables. 

With the idea of making the book more suitable than before for classes in 
business statistics, a chapter on Index Numbers has been added as an illus- 
tration of weighted averages, and the usual methods of analysis of time series 
have been described in Chapter XIV. However, topics which are clearly 
n on-mathematical, such as the preparation of diagrams and statistical tables, 
or the precise wording of questionnaires, or the techniques of sampling in an 
arbial survey, have been passed over very lightly indeed. It is extremely 
doubtful whether a first course in statistics is the place for an extended treat- 
ment of these topics. 

Worked examples have been freely used throughout the text. Some of 
these have been chosen from Canadian statistical sources (one of the authors 
being a Canadian), but the great majority are American. Some of the sets of 
data used are artificial, but they illustrate the ]>ojnt at issue. An instructor 
who feels that it is important for his stiulcnts to work on up-to-date material 
can probably find what he wants somewhere in the flood of recent official 
Oovomment or United Nations statistical publications, or in the records of 
experimental work carried out by himself or his colleagues. We feel that an 
understanding of the method he is using is more important for the student of 
statistics than the particular data on which he practices. 

The tables in the Appendix, and interspersed in the text, have been consider- 
ably increased in number, so that the student now has at hand all the tables 
necessary for the application of the common statistical tests. (The table of 
logarithms that was included in the earlier editions has been omitted, as such 
tables are readily available.) A copy of Barlow’s Tables of Squares, Cubes, 
Square Roots, Cube Roots and Reciprocals (New York, Tudor Publishing Co., 
Inc.) is very convenient to have at hand when engaged in computing, and 
Fisher and Yates’ Statistical Tables for Biological, Agricultural and Medical 
Research (Oliver and Boyd) is an invaluable reference. For permission to 
reprint tables or parts of tables, the authors are grateful to Sir Ronald Fisher 
(Tables II and III), Prof. G. W. Snedecor (Table IV), Dr. C. Eisenhart 
(Table 47), Dr. W. J. Dixon (Table 49), Dr. E. S. Pearson (Table V and Charts 
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I and II), Dr. P. C. Mahalanobis (Table 51), Dr. P. R. Rider (Table 53), and 
Dr. J. W. Campbell (Table VI). 

The references at the ends of many of the chapters are intended to direct the 
attention of the elementary student to a few selected books or journal articles 
of not too technical a nature, from the reading of which he might be expected 
to profit. References are also given to a few articles from which tables or 
parts of tables have been quoted, so that the originals may be consulted in a 
good library; but no attempt has been made to give chapter and verse for all 
methods and formulas which are mentioned in the text. It would be impossi- 
ble, even if it were desirable, to mention all those books, papers, and lectures 
from which the authors have derived inspiration and upon which they have 
perhaps leaned. 

The authors are indebted to Professor A. T. Craig, State University of 
Iowa, and to Mr. Albert Shaw, formerly lecturer at the University of Alberta, 
and now Assistant Provincial Statistician, for reading the manuscript and for 
correcting several errors and dubious statements. For what errors and 
obscurities remain (and it is surely too much to hope that none remain!) 
the authors must accept the blame. 

J. F. K 
E. S. K. 
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FOREWORD 


This book, in its present form, has been used as the basis for a 3-hour course 
(running for about 26 weeks) at the University of Alberta. Students come 
from the Faculties of Arts and Science and Flducation and have different 
mathematical backgrounds, but many of them have only high school algebra. 
It has been found possible to cover most of the material, with the exception of 
§§12.15, 12.16, 13.10-13.16, 14.17, 15.12-15.14, 16.1, 16.2, 16.8, and 16.16, 
which are either passed over lightly or omitted altogether. In the usual two- 
semester course at an American university, which is rather longer than the 
Alberta year, some of this omitted material might be included. 

It is the practice at Alberta to combine a 3-hoiu-a-week laboratory period 
with the lecture course. During these pericnls actual computations are carried 
out, with the help of hand-operated calculating machines (the Monroe Edu- 
cator model) and reports are subsequently written up. If a laboratory is not 
feasible, the students should at least work a considerable number of the numer- 
ical exercises at the ends of the chapters in order to gain facility in the hand- 
ling of data. 

A fuller treatment of some important topics, such as the Analysis of Vari- 
ance, which an‘ touched on only lightly in this book, mav be found in Part Two 
of our Maihcviatics of StatishcSf Van Nostrand, 2nd edition, 1951. 




MATHEMATICS OF STATISTICS 

INTRODUCTION 

0.1 The Scope of Statistics* As the name implies, statistics originally 
meant information useful to the state, for such purposes as taxation or the rais- 
ing of an army. Later, it came to mean quantitative data which tend to 
fluctuate in a more or less unpredictable way, and it is still used popularly in 
this sense, as when the newspapers talk about the statistics of highway acci- 
dents or of births, marriages, and deaths. 

More recently, statistics has usually meant the science (and art) concerned 
with the collection, presentation, and analysis of quantitative data so that 
intelligent judgments may be formed upon them, as exemplified, for instance, 
in the statistical reports presented by many large companies to their share- 
holders. For a great many practicing statisticians this is, in fact, the most 
important part of their work, the results of which are embodied in neat tables 
and diagrams. But although many difficult problems arise in the collection 
and processing of the raw data, these problems are not usually of a mathe- 
matical nature, and a good deal of the work is of a routine character: 

Of late years statistics has more and more come to the help of the other 
sciences, particularly the biological and social sciences, as an aid to the intelli- 
gent planning of experiments (so as to secure the maximum of information 
for a given expenditure of time and money) and as a means of assessing the 
significance of the results obtained by experiment. In the ^^exact^' sciences, 
such as physics and astronomy, with their relatively high precision of measure- 
ments, there did not appear to be the same need for statistical methods as in 
agriculture, medicine, economics, and many other fields, where the results of 
experiments are complicated by a multiplicity of factors beyond the control 
of the observer. For a long time, in fact, statistics in the physical sciences 
hardly progressed beyond the calculation of a standard error (or, more likely, 
a ^^probable error”) and the occasional fitting of a curve by least squares. 
There is now, however, a keen appreciation among physicists of the fimda- 
mental role of statistics in the treatment of the complex problems of molec- 
ular, atomic, and nuclear structure. 

Moreover, in a very different field, statistics has invaded industry, and 
statistical control is now applied by many large manufacturing concerns to 
ensure that the quality of their product remains reasonably constant. It is 
broadly true, however, that the bulk of modern applications of statistical 
methods is in the biological, psychological, and sociological sciences. 

From the point of view of these and other sciences, statistics may be 
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regarded as the technique of drawing valid conclusions from a limited body of 
experimental or observational data. Occasionally, as in a decennial census 
of the whole population of a country, the results of observation may be 
regarded for practical purposes as exact, but such a census is extremely 
costly and time-consuming; as a rule, inferences must be made about a popu- 
lation on the basis of observations made on a few relatively small samples. 
Such inferenc/es are not certain; they are merely more or less probable, and 
the methods of modern statistics enable us to estimate the probability of any 
conclusions which may be drawn. 

0.2 Mathematics and Statistics. The proper treatment of observational 
data is primarily the concern of the observer who collects them. A large 
literature has grown up describing appropriate procedures for estimating the 
magnitude of observed effects and testing a variety of hypotheses regarding 
them. This body of material comprises what is known as Experimental 
Statistics. Usually, however, the procedures and tests of experimental sta- 
tistics are not really valid unless the observations satisfy somexather stringent 
conditions, which they may easily fail to do. The discussion of the exact 
conditions under which these procedures are valid, and the development of 
new kinds of tests and new designs of experiments, are the concern of Mathe- 
matical Statistics. This subject is now a discipline of its own, recognized as 
a distinct department in several universities. It is a highly specialized sub- 
ject, which utilizes advanced and rather abstract mathematical ideas and is 
therefore well beyond the level of this book. Accordingly, no attempt will 
be made here to justify the use of the various tests descrilied in the chapters 
dealing with statistical inference. Proofs of some of these will be found in 
Part Two*, which is more advanced mathematically. 

0.3 Calculating Machines.f A full description of the parts of a calculating 
machine and their operation may be obtained from an Imtruchon Book fur- 
nished by the manufacturer, so only a brief description will be given here. 

A calculating machine is constructed to add and subtract. By means of 
continued addition or subtraction, operations involving multiplication, divi- 
sion, and square root can also be performed with great speed. 

In addition to a keyboard on which numbers can be punched, most machines 
have a sliding carriage, carrying two dials one above the other. These dials 
are called revolution register (upper dial) and product register (lower dial). In 
finding a product nXj one of the factors n is punched on the keyboard and as 
the motive crank at the side is turned, the other factor x appears on the upper 
dial. The product nx is then read from the lower dial. 

An important property of the modern calculating machine is its adapt- 

• Kenney and Keeping, Mathematics of Staiistxcsy Part Two, 2nd Ed., D. Van Nostrand 
Co., Inc., New York, 1951. 

t The early history of modem computing machines is outlined in the American Matke- 
maiical Monthly, 81 (1924), pp. 422-429. 
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ability to short cuts and combinations of operations. For example, one 
multiply two numbers nx and add the result to a third number k without tabu- 
lating the intermediate steps. This is accomplished by punching the number 
k on the keyboard, transferring it to the lower dial (product register), and 
then proceeding as in finding the product nx. The result nx + k is then read 
from the lower dial. An extension of this procedure is especially useful in a 
series of computations where k and n are constant and vaiious values are 
assigned to x. To describe the procedure, suppose it is required to calculate 
the successive values of 12 + Co: for x = 5, 7, 15, 12, etc. The number fc = 12 
is first registered on the lower dial, then the factor = 6 is placed on the key- 
board, and by turning the crank forward five times to make the first value of 
X = 5 appear on the upper dial, the result 12 + 6X5 appears on the lower 
dial. Instead of clearing the dial, the crank is now turned forward twice 
more to rebuild the value x = 5 into x = 7, and the result 12 + 6 X 7 can 
be read from the lower dial. In rebuilding x = 15 into x = 12 the crank is 
turned backwards. This procedure can be repeated until all the required 
values of 12 + 6x have been calculated. A process of this sort is called the 
continuous method of calculating. 

In most of the exercises in this course, the computations are not particularly 
laborious and calculating machines are not really necessary. However, if 
machines are available they will be found a great help. Elaborate, fully 
automatic, electrically operated machines are not required; the small hand- 
operated models, such as the Monroe Educator, are quite good enough for 
practice. The trained statistician, of course, will profit by all the devices 
with which the modern high-speed ma(‘hine is equipped. 

0. 4. Collateral Reading. Perhaps no single textbook can meet all the 
needs of all students of statisti(;s. There are several good books on elementary 
statistics which, although not fundamentally different, present different points 
of view on certain topics and treat them with varying degrees of emphasis 
depending upon the field of major interest. At least some of the books listed 
below should be readily available on dJie reserve shelf of the library. The 
list should be useful to those who wish to study more fully certain details in 
which they may lie interested. 

1. F. E. Croxton, and D. J. Cowden, PracUcal Business Statisticsi Prentice-Hall, Inc , 
1948). Mainly nonmatheraatical, with emphasis on the procedures customary in business. 
Contains some useful tables. 

2. F. E. Croxton, and D. J. Cowden, Applied General Statisties (x^rentice-Hall, Inc., 
1939). An encyclopedic collection of statistical mt‘thods and elementary theory. Very 
useful to the practicing statistician. 

8. W. E. Deming, Statistical AdjtistmerU of Data (John Wiley & Sons, Inc., 1943). 
Concerned primarily with curve fitting by the method of least squares. 

4. W. Et Deming, Some Theory of Sampling (John Wiley & Sons, Inc., 1950). A dis- 
cussion of the techniques and errors of sampling, vith some general mathematical theory 
of the design and analysis of sampling procedures. 
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CHAPTER I. 

0 

FREQUENCY DISTRIBUTIONS 

1.1 Variables and Constants. A variable is a quantity which may take 
on any value from a given set of values, called its domain. Thus, the sex of 
an animal is a variable of which the domain contains the two values, male 
and female. The daily rainfall at a certain place is a variable of which the 
domain includes all rainfalls from zero up to some indefinite upper limit which 
is the largest daily rainfall conceivable. 

A variable whose domain contains only one value (in a particular situation 
or discussion) is called a constant. A constant which changes its value 
from one situation to another is often called a 'parameter. In the equation 
^ = 5x + c, representing a family of parallel straight lines, the c is a param- 
eter. Its value changes from one line to another, but is constant (say 3) for 
all points (x, y) on one given line. 

Statistics is concerned with variables that fluctuate in a more or less impre- 
dictable way, such as the monthly total of highway accidents in the state of 
New York or the daily yield of milk by a certain cow. These are said to be 
random variables. The particular day of the week corresponding to a date in 
the future, such as June 17, 2049, is not a random variable because a rule can 
be given for calculating it exactly, assuming that the present Gregorian cal- 
endar persists. There may be assignable causes with predictable effects on 
the total of highway accidents in a certain month, but no one would expect 
to be able to predict this total exactl 3 ^ The essence of a random variable is 
that some part of it is unpredictable. 

1.2 Variates. The raw material of statistics consists of numbers usually 
obtained by some process of counting or measurement. These are referred to 
collectively as the data. It is convenient to replace the values of a random 
variable by numbers. These numbers may be assigned in a rather arbitrary 
way, as when we denote a female by 0 and a male by 1, or they may be the 
numerical value, in suitable units, of a measurement, as when we represent 
the height of an individual by a certain number of inches or centimeters. In 
both instances we replace our variable by a variate y the domain of which is 
always a set of real numbers, and which can be denoted by a letter such as 
X or y. Thus, if the variable is the face that comes to rest uppermost when 
an ordinary die is rolled, the corresponding variate is naturally the number of 
spots on this face, with a domain consisting of the real numbers 1, 2, 3, 4, 5, 6. 

It is evident from the foregoing illustrations that variates are of two kinds: 
discrete and continuous. Discrete variates have domains restricted to isolated 

5 
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real numbers, most frequently positive integers or zero. Examples are the 
number of children in a family and the number of heads in ten tosses of a 
coin, where the values are obtained by counting. Continuous variates cor- 
respond to variables which are measuredj theoretically to any degree of fine- 
ness. Such variables, for instance, are height, weight, and temperature. 
The domain of a continuous variate is an interval, or set of intervals, on the 
real number axis. 

In some investigations the variable cannot be accurately measured, but 
individuals can be ranked more or less accurately. Thus a foreman may 
rank a number of his workmen in order of competence or a judge in a taste 
trial may rank several varieties of ice cream in order of preference. The 
rather vaguely defined variable is thus replaced for statistical purposes by a 
discrete variate with domain consisting of the numbers 1, 2, 3,* • • 

The concept of randomness, as applied to a variable, means that to every 
value of the corresponding variate within its domain we can assign a prob- 
ability^ namely, the probability that (in a given set of circumstances) that 
particular value will actually be realized if a measurement or count is made.* 
The term ‘^probability’^ will be discussed in more detail later on, but for the 
present we assume that its meaning is at least vaguely understood and that 
it is a number between 0 and 1 inclusive, an impossible value having the 
probability 0 and one that is absolutely certain the probability 1. If we toss 
an ordinary coin repeatedly we know from experience that about half the time 
it will come down heads and about half the time tails. We may think of a 
variate x with the values 0 (for heads) and 1 (for tails) . It seems reasonable 
to allot to each of these a probability approximately J, since we know that 
both are about equally likely and one or the other of them must occur in any 
toss.f In the same way, the probabilities for a well-made die corresponding 
to each of the six possible values of x must l>e nearly g, since we have good 
reason to believe that each of the six faces is as likely to show up as any other 
and it is certain that one of them 'will. 

If the variable is height, say of an adult male American, and the corre- 
sponding variate is the number of inches in the height of an individual, its 
domain is a continuous interval extending from, say, 50 to 85. The probabili- 
ties are, however, relatively extremely small near either end of the domain, 
since dwarfs and giants are rare, and relatively large between 65 and 75, since 
most adult males have heights somewhere in this range. 

1.3 Errors of Measurement. If fifty boys each try to measure the length 
of a classroom, along the floor between the two end walls, to the nearest 
sixteenth of an inch, using the same ordinary 12-inch ruler, they will certainly 
come up with a variety of answers. These answers will probably cluster 

* The terms “random variable/’ “chance variable,” and “stochastic variable” are often 
used in the literature as synonyms of “variate” in the sense given above. 

t Cases in which the coin stands on end are ignored I 
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around an average value, with a few values scattering rather widely. If we 
assume for the present that the average value is a good approximation to the 
true value, the differences of the various values frorr^ the average represent 
errors of measurement. These errors constitute the random element in the 
measurement of the length. They are due to many causes, partly psycho- 
logical, but liave two important characteristics: positive and negative errors 
tend to occur about equally often and small errors are much more probable 
than large ones. Of course, gross errors, such as might be produced by read- 
ing a 3 on a scale as an 8, are excluded from consideration. 

Another kind of error may be due to imperfection in the measuring instru- 
ment itself, and this error is called systematic or biased^ as opposed to random. 
Thus it may well happen that the 12-inch ruler used to measure the classroom 
is itself one-sixteenth of an inch short, and then all the measurements made 
with it will be too large. The errors are all in the same direction and do not 
tend to cancel when an average is taken. It has been found in accurate 
astronomical work, when observers have to time the exact instant of passage 
of a star-image over the crosshairs of a fixed telescope, that many people 
have systematic personal errors, tending always to be a little fast or a little 
slow in their reactions. Once this has l^een determined it can be allowed 
for, and, in this particular type of observation, personal error can be elimi- 
nated altogether by using suitable photoelectric devices instead of the human 
eye. It is the aim of good experimental work to eliminate systematic errors 
as far as possible, but there practically always remains a residuum of unavoid- 
able error which is assumed to be random. 

If the measurement consists solely of counting a few well-defined objects, 
there is, of course, no error. If a man gives the number of his children to the 
census-taker as 6, that figure may be assumed exact. Yet if the total popula- 
tion of the United States is given as 150,697,361 (the census figure for April 1, 
1950) this number, even though obtained by counting, is almost certainly 
not exact. There are too many possible sources of eri '>r (people not counted 
or counted twice, or born just after April 1 and counted, etc.) for us to believe 
that the figure was correct even on the dny of the census. 

1.4 Accuracy. We have seen that most statistical data relate to variates 
which are subject to random error and therefore are only approximately cor- 
rect as measured. With continuous variates, the observed values as recorded 
can never be absolutely established by measurement. Thus, the height or 
weight of an object can be measured only approximately, the error depending 
upon the precision of the instrument and the care and accuracy of the observer. 
However, it is not always necessary that measurements be recorded as accu- 
rately as it is possible to make them. Similarly, with discrete variates the 
standard of accuracy used may be less than it is possible to obtain. In popu- 
lation statistics, for example, it may be sufficient to record the numbers to 
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the nearest thousand, with thiee zeros at the end to fill out to the decimal 
point. Thus, 

City Population 

A ' 326,000 

B 729,000 

On the other hand, the exact number of students in a university might be 
required. The degree of accuracy needed is determined by the purpose of 
the investigation and it is limited by the closeness with which the variables 
can be measured. 

It follows, therefore, that the degree of accuracy in the final result of a 
problem involving computations is limited by that of the original data. 
Students sometimes carry results of problems to five or more decimal places 
when the original data do not justify more than two or three decimal places. 
A table of measurements which constitutes the raw data for a statistical 
investigation should always specify the degree of accuracy in the readings. 
Thus, if monthly rainfall is being measured to the nearest hundredth of an 
inch, and one measurement seems to be exactly 5 in., it should be recorded 
as 5,00 in., with two zeros. A measurement that is merely recorded as 5 
means it is correct to the nearest integer and its true value lies between 4.5 
and 5.5, whereas 5.00 means the true value is known to lie between 4.995 and 
5.005. The three digits in 5.00 are said to be significant. 

1.6 Significant Figures. A clear understanding of the meaning of signif- 
icant figures is important in numerical work. In a number recorded as the 
result of a measurement, all digits except zero are always significant. Zeros 
which commence a number are nonsignificant; they are merely position-fillers 
necessary to indicate the meaning of the first significant digit. Thus in 
0.00327 there are only three significant figures, and the 3 means 3 thousandths. 
Zeros which conclude a number are significant if they follow the decimal 
point, as in 5.200, but if they lie to the left of the decimal point they may or 
may not be significant. It is impossible to tell merely by looking at the 
number 186,000 whether there are three, four, five, or six significant figures 
in it. If the number is written in scientific notation as 1.86 X 10®, we know, 
however, that only three figures are significant. To indicate that six figures 
are significant we should write it as 1.86000 X 10®, and this would mean that 
the number lies between 185,999.5 and 186,000.5. Zeros which occur else- 
where than at the beginning or the end are always significant, as in 1.002. 

In multipl 5 dng or dividing numbers time is saved, and a delusive appearance 
of accuracy is avoided, if the numbers with most significant figures are first 
rounded off. If no number contains fewer than s significant figures, the others 
should be rounded off, if necessary, to s + 1 figures. After the calculation 
has been performed, the result is rounded off to s figures. Thus, if we wish 
to evaluate v =* (4/i)T7^h, where r =* 22.264 and h » 7.2, these being measured 
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values, we round off r to 3 figures as 22.3. The number 4/3 is exact and there- 
fore not subject to error. The number ir is also exact but can be approxi- 
mately represented by a decimal with as many significant figures as required. 
Here it may be taken to 3 figures as 3.14. The calculation gives v =« 14,990, 
which may be rounded off as 1.50 X 10^. According to the foregoing rule, 
only two significant figures should be kept, since h has only two figures, and 
this would mean writing v = 1.5 X 10^, but it is better to interpret the rule 
somewhat liberally. Remember that a 3-digit number beginning with 1 is 
not very different in magnitude from a 2-digit number beginning with 8 or 9. 
The rules of significant figures should be applied with common sense. 

In adding numbers of different orders of magnitude, it is not the significant 
figures that matter but the decimal places of the digits. Thus, if we have to 
add three measured lengths, 176 cm, 2.846 cm, and 0.03 cm, it is clear that the 
digits after the decimal point in the second and third numbers do not matter, 
since the first number may be anywhere between 175.5 and 176.5. The last 
two numbers are therefore rounded off as 3 and 0, and the sum is 179. If 
physical quantities are to be added, they must all be expressed of course, in 
the same units. 

In subtracting two numbers of about the same magnitude, almost all the 
significant figures may be lost. Thus 19177.3 — 19171.6 = 5.7. The num- 
ber of significant figures has dropped from 6 to 2. Tliis is an important 
point, as some conunon types of statistical calculation involve such subtrac- 
tions. It is therefore good practice, in a calculation where subtractions occur, 
to carry three or four more figures than are apparently' justified by the accu- 
racy of the data. Figures can always be dropped, but they cannot be re- 
inserted if they have been dropped too soon. There is nothing foi it then but 
to start the calculation all over again. 

With an automatic calculating machine it is easy, of course, to carry all the 
digits of which the machine is capable. This does little harm as long as the 
final result is propeily rounded off to the degree of accuiacy justified by 
the data. 

In rounding off numbers, the last digit kept is increased by 1 if the first 
digit dropped is 5 or more, and is unaltered if the fiist digit dropped is 4 or 
less. If the dropped part is 5 exactly, it is usual to increase the last digit 
kept by 1 if it is odd and to leave it unaltered if it is even. Thus 7.1257, 
3.525, and 4.735 are rounded off to three figures as 7.13, 3.52, and 4.74, respec- 
tively. This procedure insures that in many such roundings-off the numbers 
will be increased about half the time and decreased about half the time. 

1.6 Sources of Data. Data for statistical investigation may be collected 
ad hoc (for example, by carrying out an experiment, interviewing selected 
individuals, or sending out questionnaires by mail), or may be obtained at 
second hand. The primary sources of data, those with the greatest reli- 
ability, are issued by the responsible authorities that collect them, ilxamples 
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are the publications of the Buieau of the Census or those of the Bureau of 
Agricultural Economics. Secondary sources of data are trade journals, news- 
papers, textbooks, and the like, and these should, if possible, be checked before 
being accepted as authoritative. 

The actual collection of data is not part of the mathematical aspect of sta- 
tistics, and little will be said about it here. It must never be forgotten, how- 
ever, that elaborate mathematical techniques of analysis cannot compensate 
for bias and crudity in the original data. It is not always a simple matter to 
frame definitions of categories that will be free from ambiguity or questions 
that will elicit exactly the information required; and measurements are sub- 
ject to human as 'well as instrumental errors. Moreover, in only a very few' 
cases, such as a decennial census of the population, can a complete count be 
attempted; usually we have to be content with results from a comparatively 
small sample. The mathematical techniques for drawing valid conclusions 
about the population from such samples depend upon the assumption that 
the sample is random, that is, that every individual in the population has an 
equal chance of being included in the sample, and this is often difficult to 
arrange. In practice, other schemes of sampling than the purely random 
one are often preferred (for example, the population is sometimes divided into 
classes or strata, and random samples are taken from each stratum), but 
there must alwa3^s be some clement of randomness about the sampling pro- 
cedure. Only thus can we be sure that the choice of individuals for the 
sample is independent of any personal predilections or preconceived ideas on 
the part of the investigator, and only then can the nuithematical theory of 
sampling be validly applied. 

1.7 Classification and Tabulation. After the data have been collected in 
any statistical investigation the first step has to do with introducing order in 
the raw material. Usually we have some hundreds of observations which 
have been recorded merely in the arbitrary order in which they happened to 
be made. But in order to analj'ze a set of observations so that intelligent 
judgments may be formed about it or so that comparisons may be made 
betw^een two sets, proper classification is necessar}' and of prime imjrortanee. 

Most people, until the}' have tried, imagine that to collect and arrange data 
in classes and in tables is a strarghtfoiward procedure involving no great 
technique or experience. Although mu(;h can be learned from a careful 
study of the illustr’atioTis and discussions that appear in the following pages 
and the compilations of reputable bureaus such as the census volumes, never- 
theless, experience is the best teacher in effecting the most appropriate classi- 
fication for any set of data. 

In carrying out the piocess of classification, it is natural to arrange the 
results in tabular form, setting forth clearly and explicitly the information 
one wishes to present. In drawing up any table the following general rules 
should be observed: 
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(1) Every table should be self-explanatory. The title should be short, but 
not at the expense of clearness. It should usually tell us the ‘ Vhat, where, 
and when'' of the data, in that order: thus, ^'Death Hates per 100,000 of 
Population, by Principal Causes, United States, 1949." 

(2) Explanatory notes, when necessary, should be incorporated in the 
table, either directly under tlie descriptive title or diiectly under the body 
of the t«l)le. 

(3) If the headings of the various columns (often called captions or box- 
headings) refer to descriptive categories, tliey should be pla(*ed, if possible, 
in a natural ortler, and usually with the most important items in the first 
column. A column of totals is usually placed at the right, although U. S. 
government publications prefer the totals on the left 

The headings m the various liorizontal rows constitute what is called the 
slub. The same principles apply to the stub as to the captions, the first row 
being generally the most important. Totals are usually (except by the U. S. 
government) placed at the bottom. Ordinarily a larger number of items will 
be placed in the stub than in the captions, as it is more convenient to run the 
eye down a vertical column than across a horizontal row. 

(4) In tabulating long columns of figures, sjiaces should be left after every 
five or ten rows. Long unbroken columns are confusing, esjiecially when one 
is comparing two mirnhei-s in a row but in widely sc'parated columns. 

(5) If the numbers tabulated have more than four or five significant figures, 
the digits should be grouped in threes or fours. Thus, one should write 
4 685 732, not 4685732. 


Table 1, Piuncipal Statistics op Manup'actuuing Industries, (^anada, 1944. 
Classified by Origin of Material Used 


Origin 

1 

1 

Establish- 

rnents 

No. 

Employees 

No. 

Salaries and 
Wages 

S 

Cost of 
Materials 

S 

Gross Value of 
Products 
$ 

Farm 

10,329 

287,756 

394,716,309 

1,781,014,374 

2,688,731,415 

Mineral 

4,479 1 

634,512 

1,208,779,764 

2,258,796,792 

4,708,104,244 

Forest 

10,347 

186,680 

278,171,969 

495,531,476 

1,082,160,284 

Maiine 

535 

9,664 

10,327,695 

45,1)06,542 

68,882,879 

Wild life 

535 

6, UK) 

9,430,191 

28,076,572 

43,985,177 

Mixed 

2,258 

98,050 

128,195,442 

223,007,600 

481,828,520 

Totals 

28,483 

1,222,882 

2,029,621,370 

4,832,333,356 

9,073,692,519 


Nofe- Mineral origin includes industries using imported iron, steel, etc. 
Source: Canada Year Book, 1947, pp. 541-2. 
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(6) Double lines at the top (or at the top and bottom) may enhance the 
(effectiveness of a table. If the table nicely fills the width of the page, no side 
lines should be used. In such cases the omission of the side lines will have 
the tendency to emphasize the other vertical lines and cause the interior 
columns to stand out better. 

The following points are particularly important in practical work: 

(7) Source of data should be included. 

(8) Units of the data presented should be clear. 

(9) Accuracy of transcription must not only be striven for but actually 
achieved. A reader who finds one error (even though this be the only one) 
is likely to disparage the whole table. 

A specimen table* is shown in Table 1. 

1.8 Frequency Distributions. From the standpoint of a mathematical 
analysis of statistics, the most important form of tabulation is the so-called 
frequency distribution. Rough data do not convey any clear idea of the 


Table 2. Grades of 100 Students in Freshman Mathematics 


75 

86 

66 

86 

50 

78 

66 

79 

68 

60 

80 

83 

87 

79 

80 

77 

81 

92 

57 

52 

58 

82 

73 

95 

66 

60 

84 

80 

79 

63 

80 

88 

58 

84 

96 

87 

72 

65 

79 

80 

86 

68 

76 

41 

80 

40 

63 

90 

83 

94 

76 

66 

74 

76 

68 

82 

69 

75 

36 

34 

65 

63 

85 

87 

79 

77 

76 

74 

76 

78 

75 

60 

96 

74 

73 

87 

52 

98 

88 

64 

76 

69 

60 

74 

72 

76 

57 

64 

67 

58 

72 

80 

72 

66 

73 

82 

78 

45 

75 

56 


Table 3. Frequency Table of 100 Grades 


Class Lmits 

Tally Marks 

Fraquenqf 

30-39 

// 

2 

40-49 

/// 

3 

30-39 

/ 

// 

60-69 

■Mr -Mr -wr -Mt- 

20 

70-79 

-Mr -Mr -Mr -Mr -mt -mt // 

32 

aO-39 

-Mr -Mr -Mr -Mr -mt 

23 

90-99 

MH- // 

7 

Total 


too 


* Throughout the book, tables of data are presented purely as illustrations of statistical 
techniques and procedures. Whether the information presented in them is absolutely 
up-to-date is irrelevant. 
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Tabob 4. Monthly Rainfall (Inchbs) at Iowa City fob 
36 CoNSBConvB Ye abb 


Yew Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov, Dec. 


1 

2,75 

0 75 

1. 

80 

1. 

83 

2 

20 

7 

99 

0.30 

2 

29 

1. 

44 

2. 

11 

1 56 

0.31 

2 

1.49 

1 

30 

4 

41 

1 

11 

4 

46 

2 

80 

3 

01 

3 

45 

2 

33 

1 

63 

2 93 

2 72 

3 

1 46 

1. 

,23 

3 

15 

4 

30 

9 

23 

8 

29 

6 

20 

2 

50 

1. 

18 

1 

02 

1.38 

2.84 

4 

M8 

1 

75 

2 

82 

4 

37 

1 

79 

3 

01 

3 

56 

1 

64 

3 

07 

1 

98 

1.75 

1.52 

5 

1.95 

1. 

64 

2 

03 

2 

72 

3 

09 

2 

40 

0 90 

2 

40 

4 

96 

2 

30 

1 80 

0,98 

6 

2.37 

0 

64 

1 

25 

1. 

66 

4, 

26 

1 

10 

10 

10 

1 

77 

3 

43 

1 

,38 

1 78 

2 84 

7 

0 70 

1. 

51 

0 

92 

5. 

14 

4 

10 

1 

86 

7 

04 

2 

44 

1. 

82 

2 

74 

1 16 

0 55 

8 

3 66 

1. 

30 

2 

07 

4. 

60 

3. 

11 

2 

38 

3 

83 

1 

85 

3 

54 

0 33 

1 98 

2 48 

9 

4 62 

1 

15 

3 

02 

2 

89 

4 

80 

3 

26 

2 

27 

2 

85 

2 

54 

4 

38 

1 10 

0 53 

10 

0 59 

1 

82 

1 

43 

3 

23 

9 

49 

4 

,50 

3 

78 

2 

39 

0 

93 

1 

66 

1 15 

1.93 

11 

0 73 

2 

20 

3 

32 

3 

31 

4 

31 

2 

18 

5 

25 

6 

27 

4 

35 
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subject matter unless they are organized and condensed in a systematic way. 
We therefore partition the raw data into classes of appropriate size, showing 
the corresponding frequency of variates in each class. When any set of sta- 
tistics is arranged in this way it is called a frequency distribution. For ex- 
ample, upon a cursory examination of the raw data of Table 2 it is difficult 
to state any veiy definite conclusions as to whether these grades represent 
preponderantly good students or poor ones. The frequency distribution of 
Table 3 however, does give us more precise information. We see at a glance 
that there weie 32 students with grades between 70 and 80, and that all but 
16 had grades of 60 or above. In Table 4, the confusion of detail is still more 
appaient. The corresponding frequency distribution is given in Table 5. 

Tabi^s 5. Frequency Table of Monthly Rainfall (Inches) at Iowa City 


Class Interval 

Mid-x 

Frequency 

0.00- 0 49 

0 245 

23 

0.50- 0.99 

0.745 

42 

1 OO- 1.49 

1 245 

58 

1 50- 1 99 

1 745 

62 

2.00- 2 49 

2.245 

49 

2.50- 2.99 

2.745 

47 

3.00- 3 49 

3 245 

32 

3 50- 3 99 

3 745 

27 

4.00- 4 49 

4.245 

18 

4.50- 4 99 

4.745 

15 

5.00- 5.49 

5.245 

14 

6 50- 5 99 

5 745 

7 

6.00- 6.49 

6 245 

10 

6.50- 6 99 

6 745 

5 

7,00- 7.49 

7.245 

6 

7.50- 7.99 

7.745 

5 

8.00- 8 49 

8 245 

3 

8.50- 8 99 

8.745 

2 

9,00- 9.49 

9.245 

5 

9.50- 9,99 

9.745 

0 

10.00-10.49 

10 245 

1 

10.50-10.99 

10.745 

1 

Total 


432 


The width of a class is called the class interval, and in general the successive 
class intervals should be of equal width. The midvalue of such an interval 
is variously called the class mark, midvalue, central value. The width of a 
class interval is therefore seen to be the common difference between two con- 
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secutive class marks. It is also the difference between the lower (or upper) 
limit of two successive classes. Thus, in Table 5, the class interval is half an 
inch and the successive class marks are 0,245, 0.745, etc., inches. 

For some kinds of table, equal intervals would be inconvenient. This is so 
in the distribution of incomes among Canadian taxpayers (Table 6), where 
there are so many taxpayers in the low income brackets that a comparatively 
fine division of incomes is feasible. 


Table 6. Incomes 

OF Canadian Taxpayers, 1929 

Income (dollars) 

Frequency 

Under 2,000 

36,857 

2,00(y- 3,000 

22,374 

3,000- 4,000 

19,408 

4,000- 5,000 

15,049 

5,000- G,000 

9,529 

6,000- 7,000 

6,833 

7,CKK)~ 8,000 

3,950 

8,000- 9,000 

2,785 

9,(X>0-10,000 

2,185 

10,000-15,000 

5,520 

15,(K>0~20,000 

2,197 

20,000-25,000 

1,027 

25,000-30,000 

579 

30,000-.50,000 

847 

50,000 and over 

523 

Total 

129,663 


Source: Canada Year Booky 1930. 


In the very high income groups the spread is so great that, if the same class 
interval were used as for the low incomes, there would be very many classes, 
and some of them might be empty. In such a table as this it is usual to have 
an open interval at one end or the other, that is, an interval with one of its 
limits indeterminate. Thus, in Table 6 the last class is ''$50,000 or more.^^ 
Another reason for unequal and open intervals, especially in government 
publications, is to protect anonymity. In tables dealing with output, costs, 
etc., for manufacturing firms, it might well happen that one or two large firms 
would find themselves alone in the upper classes of such tables, so that anyone 
with inside knowledge of the industry could identify them and so obtain 
confidential information. With wider classes these firms are grouped 
with others. 

1.9 Class Intervals. Grouping variates into the most appropriate num- 
ber of classes is a matter of judgment. The choice of intervals to be used in 
tabulating any particular set of variates depends upon the nature and char- 
acteristics of the data and the purpose for which it is to be used. For discrete 
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variates, the unit is a natural interval and sometimes it is sa^tisfactory, (See 
Tables 10 and 11, § 2.3). However, for both discrete and continuous variates 
the following conditions should guide the choice: 

(a) We desire to be able to treat all the values assigned to any one class, 
without serious error, as if they were equal to the class mark for that interval; 
c.g., as if all 23 items in the first class of Table 5 were exactly 0.245 inch, etc. 

(b) For convenience and brevity we desire to make the interval as large as 
possible subject to the first condition. 

These conditions will generally be fulfilled if the interval is chosen so that 
the whole number of classes lies between 10 and 25, depending on the total 
frequency. A small number of classes may ''cover up'' too much detail 
whereas a large number may reveal too much detail for one to comprehend 
readily (which is just the objection to the table of original data), A pro- 
liniinary inspection of the data should accordingly be made and the highest, 
and lowe.st values selected. Dividing the difference between these by th(‘ 
tentative number of classes, we have our approximate value of th(‘ uitc rval. 
After a little prelimiiiar\^ reconnoitering an appropriate number of classes and 
their limits can be determined. Thus, in Table 4, the highest value noted 
was 10.91 and the lowest 0.08 (verify). The difference between these is 
10.83, which suggests that if w^e took 20 classes we would have approximately 
a half inch as the wddth of a class interval. This, however, assumes w'o w'ould 
start with 0.08 as our lower limit, which would give us awkward figures as 
limits. Therefore, our judgment suggests it would he bet.ter 1o start with 0 
and continue by half-inch intervals as far as is necessary to take in the range 
of the given variates. We have estimated it will take approximately 20 of 
these; actually it turns out to be 22. This number of intervals and their 
width are consistent with the general conditions (a) and (b) given above. 

In summary, useful rules for making a distribution are: 

(1) Determine the range of the table by finding the difference between 
the highest value and the lowest value among the items. 

(2) Determine the number of equal parts into which the range shall be 
divided. The size of the class interval and the number of ir tervals depend 
upon the size and nature of the distribution. (Table 3 contains rather fewer 
classes than is usually desirable but an interval of 10 units is quite conven- 
tional in students' grades. An interval of 5 would be used if grades of A, 
A—, B, B— , etc., were given instead of A, B, etc.) 

(3) Arrange a sheet with three headings: class interval, tally marks, 
frequency. 

(4) Read off the items in the raw table and for each one record a mark, as 
shown in Table 3. 
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(5) Write the sum of the marks in each row in the frequency column. 
The sum of the frequencies should, of course, equal the total number of ob- 
servations* 

Since in several calculatiohs we assume that all the individuals in a class 
have the same value of the variate as the class mark, it is worth while, if there 
seems to be some concentration of observations around special values (partic- 
ularly near the ends of the distribution), to try to arrange these concentrations 
to come somewhere near the middle of their class intervals. This is usually 
a minor consideration, however. 

1.10 Class Limits and Class Boundaries. The pairs of numbers written 
in the column of classes of a frequency distribution, and used in tallying the 
original observations into their various classes, are called the class limits. In 
Table 5 the limits of the third class are LOO and 1.49. The measurements 
were, however, recorded to the nearest hundredth of an inch, so that any value 
between 1,485 and 1.495 would have been recorded as 1.49. Similarly, a 
value between 0.995 and 1.005 would have been recorded as 1.00. The 
third class therefore actually includes all values between 0.995 and 1.495, and 
these true limits are known as the class boundaries. Denoting the variate of 
Table 5 bv j, the class marks by Xc (j-central), and the class boundaries by 
(x-encl), the first five classes of Table 5 are: 


ClnsB lAmiis 

Xe 

Xe 

0 00-9 49 

0.495 

0.245 

0 50 - 0.99 

0.995 

0 745 

1 . 00-1 49 

1.495 

1.245 

1 . 50 - 1.99 

1.995 

1.745 

2 00 - 2.49 

2.495 

2.245 


The intervals as determined by the class boundaries are adjacent, but as no 
recorded value can lie on a boundary there can be no ambiguity about the 
class to which any recorded value belongs. If the class boundaries were taken 
as 0.00, 0.50, LOO, etc., as might at first appear more natural, and a recorded 
value happened to be 0.50, we should not know whether to put it in the first 
class or the second, and would have to put i in each. 

The distinction between class limits and class boundaries is of importance 
in certain calculations that are commonly made with frequency distributions. 
In some tables the column of classes contains only single numbers followed by 
dashes, as 10-, 20~, etc. These numbers are lower class limits. 

1.11 Cumulative Frequencies. The frequencies so far considered may 
be called absolute frequencies, to distinguish them from relative frequencies 
(which are expressed as a fraction of the total frequency) and from cumulative 
frequencies.. 

Sometimes a statistical investigation is concerned with the number or 
percentage of observations which are ^^less than^^ or “more than^^ a given 
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value. This is frequently the case in educational tests and in wage or salary 
statistics. Our chief interest in such cases may be the accumulated fre- 
quency of the several class intervals up to some class boundary. Hence we 
are led to form a cumulative frequency table. Such a table is built up by 
successively adding the several (absolute) frequencies; thus: jfi, / 1 +/ 2 , 
/i 4* /2 + /a, etc., as illustrated in Table 8, where the data of Table 7 are used. 
We shall use N to denote the sum of all the frequencies. 

Table 7. Distribution of Intelligence Quotients (IQ's) of 905 School 
Children from 6 to 14 Years of Age. (Derived from 
L. M. Term AN, 'The Meoaurevnejit of Intelligence) 


IQ 

Number of Children 

55- 64 

3 

65- 74 

21 

75- 84 

78 

85- 94 

182 

95-104 

305 

105-114 

209 

115-124 

81 

125-134 

21 

135-144 

5 


Table 8. Cumulative Distribution of IQ’s (Table 7) 


Class Mark 
Mid-x 

Frequencij 

f 

UpperBoundaiy 

End-x 

F< 

F< 

n' 

59.5 

3 ==/i 

54 5 

0 

0.000 

69.5 

21 =/, 

64.5 

3 =/. 

0.003 

79.5 

78 

74 5 

24 =/. +/, 

0.027 

89 5 

182 

84 5 

102 

0.113 

99.5 

305 

94.5 

284 

0.314 

109.5 

209 

104 5 

589 

0.651 

119.5 

81 

114 5 

798 

0.882 

129.5 

21 

124.5 

879 

0.971 

139 5 i 

5 

134 5 

900 

0.994 



144 5 

905 - A 

1.000 


The cumulative frequency corresponding to the upper boundary of any class 
interval is the total absolute frequency of all values less than that boundary. 
This is denoted by F< (read as less than”). Sometimes frequencies are 
cumulated from the bottom of the table, giving F> more than”), which 
is the total frequency greater than a boundary value. If we divide F< by 
we get the relative cumulative frequencies of the last column of Table 8. Thus, 
we can readily see that about 88% of the children had IQ's less than 114.5 
and only about 11% less than 84.5. 
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The inverse operation to cumulating the frequencies is called ^‘differencing'^ 
and is usually denoted by A (delta). If S denotes any series of values, then 
AS denotes the results obtained by subtracting the first value of S from the 
eeccmd value, the second from the third, etc. DifrerenciiJg a column of cumu- 
lative frequencies obviously gives the absolute frequencies. Differencing a 
column of Fa/N values gives the f/N values. 


Exercises 

1 . How many figures are significant in 

(a) (132.30 - 131 .64) (2.97 X 32.2/0.0048)^*^ 

(b) (13. 189)^(0.010524 -^V(0.03189y2 

if all the numbers are considered as measured values which have been rounded off? 

2. In Table 3 state (a) the width of the class-interval, 

(b) the class marks, 

(c) the upper class boundaries. 

3. Tabulate the grades of Table 2, using a class interval of 5. 

4. Borrow a statistical publication such as a Company Report to Shareholders, and 
examine critically the tables you find in it. How far do they meet the requirements men-^ 
tioned in the text? 

6. (Walker) A study is to be made of school attendant^ in the 15 elementary schools of 
a certain city. Draw up a suitable table in which could be shown the average register, the 
average daily attendance ana the percentage attendance, for boys and girls separately and 
for both, in each school 

6. Consider the data of Table 9, What is the justihcation for the unequal class intervals? 

Tabub 9. Age Distkibtjtion of Deaths of Infants Undeb 1 Year of Age, 

IN U.S.A. 1917 (EXCLUDING H AW All) 


Age at Death 

Frequency 

Under 1 day 

26,606 

1 day 

8,364 

2 days 

6,344 

3 to 6 days 

12,375 

1 week 

10,911 

2 weeks 

7,717 

3 weeks but underl 

1 month J 

‘6,212 

1 month 

15,362 

2 months 

12,066 

3 to 5 months 

27,487 

6 to 8 months 

20,409 

9 to 11 months 

17,112 

Total 

171,024 


N.B. Still-births are excluded. 

Source; UJS. Bureau of the Cens'ust Mortality Statistics for 1917. 
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State the class boundaries. Draw up a cumulative frequency (F<} table. (Note that a 
child is reckoned as 1 month old when it has completed 1 month but has not yet completed 

2 months of life, and so with the other intervals.) 

7. Use Table 6 to answer (a) how often was the monthly rainfall between 2 inches and 

3 inches? (put this more exactly), 

(b) how often was the rainfall less than 5 inches? 

(c) what is about the most common monthly rainfall? 

8. Difference the last column of Table 8. 
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CHAPTER II 

GRAPHICAL REPRESENTATION 

2.1 The Function Concept. Variables which are linked or related in some 
way are encountered in various fields of human experience. Several variables 
may be linked but we shall, for the present, consider the simple case where 
only two variables are involved. For example, the two related variables may 
be time and population, variate and frequency, rate of interest and accumu- 
lated principal, age and insurance premium. The primary purpose of a graph 
is to show diagrammatically how the values of one of two linked variables 
change with those of the other. One of the most useful applications of the 
graph occurs in connection with the representation of statistical data. 

Underlying the intelligent use of graphs is the concept of function, which is 
a fundamental notion in mathematics and its applications. The student 
usually meets the word for the first time in algebra, when a linear or quadratic 
expression is spoken of as a function of x. An example is the equation 

2/ = P(i + xy 

The expression on the right is the function of x (P being constant) and for 
convenience it is denoted by the single letter y. Here x may be an interest 
rate, say 0.04 or 4%, and y dollars the amount to which P dollars will accu- 
mulate in two years at 100x% per year. 

The statement that y is a function of x is written symbolically in the form 

This implies that a value of the function y is determined when a value is 
assigned to the variable x. For this reason, x is called the independent variable 
and y the dependent variable. In place of / other letters may be used. Thus, 
any one of the symbols* 

y(x), A(x), P(x), 0(x) 

and so on, denotes a function of x. The same symbol may l>e used to denote 
different functions in different problems, but different symbols are required 
to represent different functions in the same problem or discussion. 

Examples: 

fix) = 5a:* ~ 3x 4- 2 
0(x) « Ke^^ 

* Also, yix) is used to mean y expressed as a function of x, 

21 
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Any mathematical expression involving a variable x is a function of x. 
However, the word is often used to designate a much more general relation. 
The central idea conve.ved by this meaning is that of a correspondence between 
values of y and values of x. The following definition is the result of a develop- 
ment over a long period and its formulation is due to Dirichlet, a famous 
French mathematician (1805-59), 

Definition. Let there he a set of values assumed by the independent vari- 
ahU X. If to each x in the sety there corresponds one or more values of then y is 
said to be a function of x in its domain. 

It should be observed that this definition is freed from any notion of the 
necessity of specif 3 dng the mathematical relation between x and y. A mathe- 
matical equation connecting x and y ma}^ not even exist.’*' A function may 
thus be considered as being equivalent to a table in which one may^look up 
any x in its domain and find the corresponding y. 

Many of the data in statistics come under this general definition of func- 
tion. Thus, in the following table, net earning is a function of the year, 

wliether or not there is any equation defin- 
ing that functional relationship. 

Here the function is defined only for the 
indicated points which correspond to the 
values given in the table. The straight 
lines are drawn to help the reader visualize 
the relative positions of these values and 
not to represent the function at intermedi- 
ate points. 

On the time axis a year should really be represented by an interval and 
not by a point, but the earnings actually spread over the whole year are here 
represented as a lump sum and concentrated at one point of time. The 
points on the broken line for intermediate times have no significance. A 
similar situation holds with discrete variates, such as the number of eggs laid 
wee/kly by a hen, which can obviously have only values 0, 1, 2,* • * 

If there is only one value of y corresponding to each value of x, y is called 
a single-valued function of x; otherwise y is a multiple-valued function. In 

« X, y is a two-valued function of x, since either a/x or — \/x will satisfy 
the equation. Child weight is a multiple-valued function of age, since there 
is a whole range of possible values of weight corresponding to a given age. 
The weight of a particular child is, however, a single-valued function of age. 

2.2 Charts. A detailed study of the technique of representing data by 
bar-diagrams, pie-diagrams, etc., will not be undertaken here. It is a rather 
specialized and nonmathernatical subject, and the interested student will 
find ample information in references which are given at the end of the chapter. 

* A classical example is the function which is defined for the infinite set of numbers from 
x=50tox«ltobe unity for all rational numbers and zero for all irrational numbers. 



Year 

Milhon* 

1948 

45 0 

1949 

43.0 

1950 

49.6 

1951 

51 5 

1952 

! 57.3 
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Source: Annual Report of Aluminium, Ltd., 1 951 

Fig. 1. World Production of Non-Ferrous Metals (excluding U.S.S.R.) — 

Annual Averages 



Source: SiaHemnd by OUy Commitsioiier, 1 m 

ViG, 2. How TBS Cmr of Edmonton Spends Its Dollar, 1951 
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We give one example of a bar-diagram and one of a pie-diagram to illustrate 
good practice (Figs. 1 and 2). 

The scale of heights on a bar diagram should start from zero. Otherwise 
a misleading impression of the relative heights of bars may be given. The 
practice of replacing the bars by pictures drawn to different scales should be 
avoided, as it is often not clear whether the comparison is intended to be 
between linear dimensions, areas, or volumes. If the picture of a man is 
copied on double the linear scale, the area covered by the drawing is increased 
four times but the suggestion conveyed is that of a man with volume eight 
times as great. How is the picture to be understood? 

2.3 Frequency Polygons. We present now a discussion of the graphs 
that are used in connection with frequency distributions. A distribution of 
values of a discrete variate may be represented graphically by plotting the 
points (xi,/i), (a* 2 ,/ 2 ),* • * (xkffk)t and drawing a broken line through them. 
Such a graph is called a frequency polygon because it is a polygon formed by 
connecting the tops of a series of ordinates whose lengths are proportional to 
the various frequencies and whose abscissas correspond to the variate values 

Dice-throwing Experiments (Weldon). Tpv^elve dice were thrown 4096 times. In 
Table 10, only a throw of 6 was reckoned a success. 

In Table 11, either 4, 6, or 6 was so reckoned. 


Table 10 Table 11 


Xt 

/. 

Xi 

f< 

0 

447 

0 

0 

1 

1145 

1 

7 

2 

1181 

2 

60 

3 

796 

3 

198 

4 

380 

4 

430 

6 

116 

6 

731 

6 

24 

6 

948 

7 

7 

7 

847 

8 

1 

8 

636 

9 

0 

9 

257 

10 

0 

10 

71 

11 

0 

11 

11 

12 

0 

12 

0 


4096 


4096 


of the distribution. Fig. 3 will serve as an illustration. For a table of dis- 
crete variates the function exists only for the given values. Likewise, its 
graph is discontinuous. The straight lines connecting the points serve merely 
to ‘^carry the eye,^' thus giving a better idea of the shape and position of 
the distribution. 
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Frequency polygons are also sometimes used for distributions grouped in 
classes, but only when all the class intervals are equal. The frequency in 
any class is plotted against the class mark, and the points are joined by 



0123456789 10 11 12 

I Frequency Polygon for the Distnbution of Table 10 
II Frequency Polygon for the Distribution of Table 11 

Fio. 3. Frequency Polygons for Distribution op Discrete Variates 

straight lines. The polygon should be brought down at both ends to the 
x-axis by joining it to the class marks of the nearest empty classes at each end 
of the distribution. However, it is usually preferable to use histograms for 
grouped distributions. 

2.4 Histograms. A histogram is a set of rectangles with bases along the 
intervals between class boundaries and with areas proportional to the fre- 
quencies in the corresponding classes. If the class intervals are equal, the 
heights of the rectangles are also proportional to the frequencies, as in Fig. 4, 
which represents the data of Table 7. 



Fig. 4. Histogram for Table 7 
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Graphical Representation 


II 


For the data of Table 9, however, the rectangles in Fig. 5 have unequal 
bases and their heights are adjusted accordingly. It is area here, and not 
height, that represents frequency. To avoid excessive disproportion of 
heights, the data for the first month are telescoped together. A separate 
diagram for the deaths under 1 month might be constructed to give a truer 
picture. 



Fig. 6. Agbs at Death op Infants under One Year — U.S.A. (1917) 

Note that in a histogram the rectangles are all adjacent, since the bases 
cover the intervals between class boundaries^ not class limits. In a bar dia- 
gram, on the other hand, the spacing and width of the bars are arbitrary, and 
it is only the heights that count. 

2.6 Frequency Curves. The histogram is a convenient device for repre- 
senting approximately the distribution of variate values in a sample consisting 
of a finite number (often a few hundreds) of individuals. It is approximate 
because, by the method of representation by rectangles, we are assuming that 
within any one class the values of the variate are uniformly spread out between 
the class boundaries, and this is unlikely to be the case. The smaller we make 
the class intervals, the less likely is this assumption to be even approximately 
true. Moreover, wdth a fixed size of sample and very small class intervals 
there is likely to be so much fluctuation in frequency between adjacent classes, 
and so many of them will be empty, that the nature of the distribution is 
obscured. However, if we suppose that the size of the sample is increased 
indefinitely, so that even with very small class intervals there are many indi- 
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vidusis in each class throughout the domain of the variate, the outline of the 
histogram will approximate to a smooth curve. This curve may be regarded 
as the frequency curve of the parent population from which the sample taken 
is a random sample. (Randomness means that any individual in the popula* 
tion is as likely to be picked for the sample as any other, and the population is 
supposed to be practically infinite in number.) In practice, frequency curves 
are often fitted to histograms, either by eye or more usually by calculation, 
using the known mathematical properties of certain curves which seem to be 
of about the right shape. Thus, Fig. 6 represents the data of Table 12, and 


Table 12 — Frequency Distribution of the Weights of 1000 Male 
Students (Original Measurements Made to Nearest Half Pound) 


Class 

(Pounds) 

Class 

Mark 

Frequency 

Cumulative 

Frequency 

90 - 99.5 

94 75 

2 

2 

100-109 5 

104 75 

21 

23 

110-119 5 

114 75 

104 

127 

120 - 129.5 

124 75 

196 

323 

130 - 139.5 

134 75 

248 

571 

140-149 5 

144.75 

197 

768 

150 - 159.5 

154 75 

133 

901 

160 - 169.5 

164.75 

47 

948 

170-179 5 

174 75 

25 

973 

180-189 5 

184.75 

14 

987 

190-199 5 

194.75 

7 

994 

200-209 5 

204.75 

4 

998 

210-219 5 

214 75 

0 

998 

220-229 5 

224.75 

0 

998 

230 - 239.5 

234 75 

1 

999 

240 - 249.5 

244.75 

1 

1000 


a dotted frequency curve has been superimposed on the histogram. The 
total area under the curve, like the total area of the histogram, represents the 
total frequency (in this case, 1000). Moreover, the area between two ordi- 
nates of the frequency curve, as a fraction of the whole area, represents the 
probability that an individual selected at random from the population will 
have a variate value between the corresponding abscissas. In Fig. 6 the 
shaded area represents that fraction of the population which has a weight 
between 119.75 and 129.75 lb. The population in this example is certainly 
not infinite, but it may be regarded as very large. It includes all persons in 
the category from which the actually measured sample of 1000 may be re- 
garded as drawn (say all white American male students attending university 
at the time the sample was taken). The observed relative frequency corre- 
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sponding to the same interval (represented by the area of the rectangle of 
the histogram on the same base, as a fraction of the total area) will differ 
more or less from this probabDity because of the inevitable variation between 
one sample and another, even though picked at random from the same 
population. 



Frequency Distribution of the Weights of 1CX)0 Male Students (Table 12) 

Fia. 6 

If the fit of the frequency curve to the histogram is reasonably good, these 
differences will not be excessive. A method of judging whether the fit is 
satisfactory or not will be given later after the normal curve has been discussed, 
this curve being one of the most easily fitted types of theoretical frequency 
curve. Many commonly occurring frequency distributions can be approxi- 
mately represented by symmetrical or skew humpbacked curves with mathe- 
matically determined properties. More extreme types of distribution may 
be J-shaped, like that of Fig. 5, and even U-shaped distributions (with lower 
frequencies in the middle than at the ends) are occasionally encountered. 
A discussion of some theoretical curves often used in curve-fitting (Pearson 
curves and Gram-Charlier curves) may be found in Part Two.* 

2.6 Cumulative Frequency Polygons. If the cumulative frequency F< is 
plotted against the upper class boimdary (x,) and the points are joined by 
straight lines, we obtain a cumulative frequency polygon. The polygon 
should start from zero at the lower boimdary of the first interval. Pig. 7 
gives the cumulative frequency polygon for the data of Table 8. Strictly 
speaking, F< for a continuous variate is defined for the end values x, only, 

* Kenney and Keeping, McUkematice cf Statistics^ Part Two, D. Van Noetrand Co., Inc., 
New York, 1961. 
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but if the assumption is made that the observations in any one olass are 
uniformly spread out over the whole interval, the intermediate points on the 
polygon also represent the cumulative frequencies at the corresponding values 
of X. This means that we can interpolate linearly between the class bound- 
aries, and we shall do this in the next chapter when calculating medians, 
quar tiles, etc. The straight sides of the polygon are more than just a device 
to cariy the eye, as in a broken-line diagram. 



Fig. 7. Cumulative Fkequency Polygon 

If we are dealing with a discrete variate, as in Table 11, there will be a 
sudden jump in the cumulative frequency at each observed value of x. The 
cumulative frequency polygon is stepjied, as shown in Fig. 8. Corresponding 
to a value of x at which there is a jump, there are two values of the cumula- 
tive frequency, one F< (^dess than'’) and the other F< (“less than or equal to'O- 
The difference is the frequency at this value of j. It is conventional to define 
the cumulative frequency for all values of x as the total frequency up to and 
including the given value, that is, to use F< . The upper end of each vertical 
rise on the staircase curve of Fig. 8 defines the cumulative frequency for the 
corresponding x. For continuous distributions it makes no difference whether 
we use F< or F< . 

If relative frequencies instead of absolute frequencies are used the cumula- 
tive polygon rises from the value 0 at the left to the value 1 at the right. 
These numbers are often multiplied by 100 to give percentage cumulative 
Jrequency polygons. 

2.7 Ogive Curves. If a large number of boys stand in a straight line, in 
order of height, as represented diagrammatically m Fig. 9, the line joining the 
tops of their heads is an approximate cumulative frequency curve, the fre- 
quency being in this case measured horizontally and the height vertically. 



No. of Succesaes (4» 6« or 6 with 12 Dice) 

Fig. 8. Cumulative FREguENcr Polygon 


Beiflrht (in.) 
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Thus, in the diagram there are 3 boys with a height less than 50 inches. From 
its resemblance to the shape of a moulding in architecture, known as an ogee, 
this curve was called an ogive. The name is now applied to any continuous 
cumulative frequency curve, including the ordinary cumulative frequency 
polygon. Just as the histograms and frequency polygons can often be 
approximately fitted by smooth curves, so the corresponding cumulative 
frequency polygons can be approximately fitted by smooth ogives. Indeed, 
in practice it is usually easier to fit an ogive than a frequency curve. 

Note that the ordinate at any point of an ogive (plotted with the axis of 
X horizontal) is equal to the area under the corresponding frequency curve 
from the lowest value of x up to the value corresponding to this ordinate. 
Thus in Fig. 6, the area under the dotted curve up to a: = 119.75 would be 
the ordinate for a smooth ogive at the same value of :c. If relative instead of 
absolute frequencies are plotted, the ordinate gives the approximate pro6- 
ability of a value of x less than 119.75 in the parent population. 

Exercises 

1. If/(x) «= fce-**, show that /(x) 

What is the value of /(O)? 

2. If 4>{x) « ox* -f 6x + c and 0(x) « <^( — x), show that b = 0. 

8. If /(x) = a*, show that/(u) X /(r) ~ J{u -f v), 

4. If g(x) = log {(1 - x)/(l + a:')|, show that ^(w) + g{v) ^ g{(u + v)/{l -f wv)). 

6. Toss four coins together and note the number of heads (x). Do this 50 times and 
count the number of times that x =« 0, 1, 2, 3, 4. Construct a frequency polygon to exhibit 
these results. 

6. Make a histogram for the data of Table 5 (§1.8). Regroup the data so that the class 
interval is 1 inch, and make a new histogram. 

7. Construct a histogram for the data of Table 9, Ex. 6 , Chap. I, covering the deaths 
of infants up to ages of I month. (Use the day as unit, and treat the month as 30 days.) 

8. Draw a cumulative frequency polygon for the data of Table 9 

9 . Construct a cumulative frequency polygon for the data of Table 12. Draw by eye a 

smooth ogive. Estimate from the ogive the probability of a weight leas than 130 lb in the 

population from which this sample was taken. 
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CHAPTER III 

THE MEDUN AND OTHER QUANTILES 

3.1 Averages. If 50 men and 50 women were selected at random from 
the inhabitants of any small town and their heights were measured, it is likely 
that, although both groups would show some variation, the men would be 
taller on the average than the women. An average is a value which is intended 
to be in some sense typical of a whole distribution. It is a more or less central 
value and may be regarded as a measure of location of the distribution on the 
axis of the variate x. Thus, if the two distributions (men and women sepa- 
rately) were plotted as histograms on the same axis they would overlap, but 
the one for men would be more to the right than the one for women. One of 
the simplest averages or measures of location is the median, 

3.2 The Median. The median is defined as the central value of the dis- 
tribution, a value such that greater and smaller values occur with equal fre- 
quency. If N values of x arc arranged in order from the least to the greatest, 
so that 

Xi< Xi< x%< • * • < zn 

the median is the value xt if JV is odd {N = 2A; — 1), and is not uniquely 
defined if A' is even (A — 2k) (unless Xk « xi^i, in which case the common 
value is the median). If Xk < Xk+i the median is conventionally taken half- 
way between, as i{xk + Xkr^i). Thus, for the numbers 5, 6, 10, 15, 18, 20, 25, 
the median is 15, which is Xi. If we add another value 37, the median is 
between 16 and 18 and is taken to be 16.5. 

For a frequency distribution of a continuous variate, grouped in classes, the 
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median is that value of x at which the ordinate divides the histogram into 
two parts of equal area (Fig. 10). The median is sometimes denoted by f 
(which suggests that it is a value of x) and sometimes by Mi. 

As the central value in the distribution, x is that x for which the relative 
cumulative frequency is exactly 0.5. Thus, if the data of Table 3 are plotted 
as a cumulative frequency polygon (as in Fig. 11), the median is that value 
of X which corresponds to a cumulative frequency of 50 (since N = 100) or 



a relative cumulative frequency of 0.5. On the a.ssumption that, in the interval 
between 69.5 and 79.5, the grades are evenly distributed, the median is the 
abscissa of the point C. This is clearly equal to 69.5 + AB. But, by the 
properties of similar triangles, 

AE ~ ED 

and we know that AE = 10, BC = 50 - 36 = 14, and BZ) = 68 - 36 = 32. 
It follows that 

AB = 140/32 = 4.4 

so that 

X = 69.5 + 4.4 = 73.9 

Of course, in this particular example, we have in Table 2 the original data 
before grouping, and thus can determine the median exactly. By arranging 
the grades in order, we find that the 50th and 51st are both 75, so that this is 
the true median. The foregoing method is useful when we are given only 
the grouped distribution. 

Except for the purposes of illustration it is not necessary to plot the cumula- 
tive polygon in order to calculate the median. Thus for the same data 
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(Table 8) we need only form a column of cumulative frequencies alongside 
a column of upper class boundaries, and interpolate, as follows: 


Interval 

/ 

Endrx 

F< 



29.5 

0 

30-39 

2 

39.6 

2 

4(M9 

3 

49.6 

6 

60-69 

11 

69.6 

16 

60-69 

20 

69.6 

36 

70-79 

32 

X 

-<-50 



79.6 

68 

80-89 

25 

89.5 

93 

90-99 

7 

99.5 

100 


Here, N/2 ^ 50. This value of F< corresponds to a value of x in the inter- 
val 69.5-70.5. Therefore the median is 69.5 plus a fraction of the distance 
from 69.5 to 79.5. Thus, 


Xe 

F< 


r 69.5 

se] 




bOjdi 

A 


79.5 

68 



Assuming that the items in any class interval are uniformly distributed over 
that interval, it follows that the partial differences are proportional to the 
total differences: di/A == That is, 

j - 69.5 _ 50 - 36 
79.5 ~ 69.5 68 ~ 36 

whence 

i . 69.5 + 10(|) 

« 69,5 + 4.4 » 73.9 

This process is called Ivmar interpolation, or interpolation by proportional 
parts, since it is equivalent to the geometrical method first described. 

For a continuous variate corresponding to a random variable which is 
measured in certain units, the median is expressed in the same units. Thus, 
for a sample of heights in inches, the median is also expressed in inches. 
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For a discrete variate the median may not have much meaning. Thus in 
Table 11 we have N/2 ^ 2048, and the relevant portion of the cumulative 
frequency table is 


X 

F< 

F< 

5 

695 

1426 

6 

1426 

2374 

7 

2374 

3221 


The 2048th value of x is clearly 6, which is the median according to the first 
definition of this section, but there are 948 observed values all equal to 6. 
There are, however, only 1426 values less than 6 and 1722 values greater 
than 6. 

3.8 Quartiles. Just as the ordinate at the median of a grouped distri- 
bution divides the histogram into two parts of equal area, so the ordinates at 
the quartiles Qi and Qa cut off one-quarter of the area at each end (Fig. 10). 

The first quartile, denoted by Qi, is that value of x for which F< = JV/4. 
That is, one-fourth of all the variates in the distribution are smaller in value 
than Qi and three-fourths of them are larger than Qi. The second quartile 
Qt is that value of x for which F< is N/2 and is therefore the median. The 
third quartile, denoted by Qs, is that value of x for which f< = 3iV'/4. Hence 
50% of the total frequency is included between Qi and Qz. On the relative 
cumulative frequency polygon for a continuous variate, the ordinates at 
Qif Q 2 y are 0.25, 0.50, and 0.75, respectively. 

Like the median, the quartiles are calculated by interpolation in the cumu- 
lative frequency table, as illustrated by the following example. 

Example, (a) Find the median and the quartiles for the distribution of IQ’s in Table 7 
($1.11). (b) Illustrate the measures found m (a) by means of a F< graph. 


X, 

F< 

54.5 

0 

64.5 

3 

74.6 

24 

84.5 

102 

4-Q, 


94.5 

284 



104.5 

589 



114.5 

798 

124.5 

879 

134,5 

900 

144.5 

AT - 905 
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Siiiution: 


N/i « 226.25, N/2 « 462.6, 


Qi 84,5 

226.25 - 102 

10 

284 - 102 ’ 

Qi - 94.5 

452.5 - 284 

10 

” 589 - 284 ’ 

Cs ~ 104.5 

678.75 - 58S 

10 

798 ~ 689 ‘ 


3JV'/4 » 678.76 
Qi « 91.33 

Qi « 100 02 


Q, - 10179 


Q 


Qt-Qi 

2 


8.73 



Fig. 12 explains graphically the measures obtained by interpolation from 
a F< table. For convenience in drawing the figure, the quartile labels are 
put on vertical lines. But one should remember that the quartiles are values 
of X and that it is the horizontal distances of the lines from the y-axis that 
represent these measures. 

3.4 The Quartile Deviation. Half the distance between Qx and Qi is 
called the semi’4nterquartile range or the quartile deviation and may be denoted 
by Q. Thus, 

0 - (Qa - Qi)/2 

The median is not necessarily 
midway between Qi and Qa (see 
Fig. 13), although this will be so 
for a symmetrical distribution. 

For a variable which is meas- 
ured in certain units (such as a 
height in inches), the quartile 
deviation is expressed in the 
same units. 
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The purpose of calculating the quartile deviation is to have a measure of 
the dispersion of the distribution, that is, of the way in which it is spread out 
along the z-axis. Some distributions show a marked tendency to bunch 
around a central value, whereas others are more nearly uniformly spread over 
a finite interval. Pig. 14 shows that two distributions may have the same 



Fig. 14. Distribxttion with Ditperent Dispersions 


median and total frequency, yet differ considerably in dispersion. The 
different quartile deviations for these two indicate the difference in dispersion. 

3.6 Quantiles. We can calculate many other quantities* similar to the 
quartiles, but corresponding to different fractions of the total frequency. 
These statistics are collectively called qujantiles, or sometimes fractiles. For 
example, in educational and psychological work deciles are of ton calculated. 
The deciles Dy to Dg are values of the variate x (e.g,, scores on an intelligence 
test) such that one-tenth of all the values obtained lie below Di, one-tenth 
between Di and Da, and so on. Thus we can say that any person who has 
taken the test is in the top (or some other) tenth of the whole group tested, 
and thus form a readier appraisal of his performance. With a large sample 
it may be useful to calculate percentiles. The fcth percentile Pk is that value 
of Xf say Xk, which corresponds to a cumulative frequency of Nk/ 100. The 

* A quantity such as a median, quartile deviation, etc., which is calculated from the 
observed data, is often called a ‘‘statistic.” This is a useful technical sense of the term, but 
one which is quite distinct from the various meanings of “statistics” discussed in $0.1. 
See also {7.9. 
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60th percentile is obviously the median, the 25th percentile is Qi, the 20th 
percentile is Dj, and so on. The calculation of any percentile by linear inter-- 
polation is precisely similar to that of the quartiles. 

One objection to the quartile deviation as a measure of dispersion is that 
it is not very sensitive to difiFerences affecting mainly the ends of the distribu- 
tion. All the X values lying beyond the class interval containing Qj, for 
instance, might be increased arbitrarily without changing the position of Q$ 
at all. For this reason it is often felt that the distance from Pr to P«8 is a 
better measure of dispersion, since comparatively few observations lie outside 
these limits. Of course, the whole range from the least to the greatest value 
of X could be used, but it suffers from the defect of being unduly sensitive to 
sampling variation. That is, the range depends a great deal on the chance 
presence or absence in the sample of a few extreme values of the variate. 

3.6 Percentile Ranks. The percentile rank of P* is k. Thus, instead of 
saying that the 20th percentile is 57, we could say that the percentile rank of 
57 is 20. Both statements mean that in the particular sample studied 20% 
of the individuals had a value of the variate x less than 57. If P< is the 
cumulative frequency corresponding to the A;th percentile, then 

F</N - fc/100 

The relation between percentiles and percentile ranks is that between abscissas 
and ordinates of a percentage cumulative frequency polygon, as illustrated 
in Fig. 15. 



Fig. 15. PsBCBNniiEs and Pebcentilib Ranks 

The calculation of a percentile rank is similar to that of a percentile, but 
the interpolation is in the column of cumulative frequency rather than in the 
column of upper class boimdaries. Suppose we want to find the percentile 
rank of 110 in the data of the example in §3.3; that is, we want the percentage 
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of the school children studied with an IQ below 110. Now, 110 lies in the 
inteiral between 104.5 and 114.5, and the values of F< for the ends of this 
interval are 689 and 798. Hence the value of F< corresponding to 110 is 
given by 

F< - 589 ^ 110 - 104.5 _ 

798 - 589 114.5 - 104.5 ~ ‘ 

so that F< « 704. This, as a percentage of 905, is 77.8, which is the required 
percentile rank. It would usually be rounded off as 78. 

In the field of education, percentile ranks are often referred to as grades, 

8.7 Approximate Characterization of a Distribution by Quantiles. If in a 
large sample we know the median, the quartiles, and the two extreme values, 
we can form a pretty good idea of the whole distribution by plotting these 
five points and sketching in a percentage cumulative frequency curve joining 
them. As shown in Fig. 16, these five points are equally spaced along the 



Fig. 16. Catch op Soles, Abehdbbn, 1912-1937 

vertical axis. The data of the figure refer to the total catch of soles* at 
Aberdeen, Scotland, over a period of 26 years (units not stated). 

The catch in 1937, marked by an asterisk, is seen to correspond to a per- 

* D'AAjy W. Thompson, Naiure, 144 1939, p. 445. 
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centile rank of 96. On the basis of the 26 years studied we should expect the 
annual catch to exceed this value (75) only about once in 25 years (4%). 

If a set of data is too scanty to permit forming a frequency distribution, 
an approximate ogive can be drawn by arranging the variate values in order, 
giving them cumulative frequencies of 1|, 2^, and so on, and plotting these 
frequencies (expressed as percentages) against the corresponding x. Thus, 
Table 13 gives the maximum annual flow* in the North Saskatchewan River 


Tabus 13. Maximum Annual Flow op North Saskatchewan River At Edmonton 

(Percentage of Mean) 


X 

F< 

%F< 

X 

F< 

%F< 

66.1 

0 5 

2.8 

102.5 

9.5 

52.8 

67.0 

1.5 

S 3 

105.0 

10.5 

68.3 

73.2 

2 5 

13 9 

113 7 

11.5 

63.9 

76.3 

3 5 

19.4 

116 8 

12.5 

69.4 

83.8 

4 5 

25.0 

117.8 

13.5 

75 0 

85.6 

5.5 

30 6 

120 6 

14.5 

80.6 

88.1 

6.5 

36.1 

123.4 

15 5 

86.1 

98,4 

7.5 

41.7 

124 0 

16 5 

91.7 

99.2 

8 5 

47.2 

138 8 

17.5 

97.2 


for each of 18 years, placed in increasing oider. The value x = 83.8 is given 
the cumulative frequency of 4.5 because theie are 4 values less than, say, 





• The flow is expressed as a percentage of the mean (see Chapter IV). The unite do not 
matter for the present purpose. 
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83.7 and 6 values less than 83.9. The value 83.8 is, so to say, split in two 
and reckoned half with the lower values and half with the father values. 
The plotted points do not lie on a smooth curve, but an approximate ogive 
may be sketched so as to lie fairly evenly between them (Fig. 17). This 
og^ve may be used to estimate the chances of a flow of any given size. 

Exercises 

1 . Calculate the median and quartiles, and the quartile deviation, for the distribution of 
Table 12 (§2.6). 

2 . Calculate the median income in Table 6 (§1.8). Note that the median can be found 
even though there are open classes at the ends of the distribution. 

8. Use the cumulative frequency polygon of Exercise 8 (Ch. II) to find approximate values 
of the median and quartiles of age at death of infants dying under 1 year. Note that this 
distribution is very asymmetrical (skew) and that consequently — QJ is very different 
from Qj — Qi. 

4 . Explain why the median is found from interpolating in the end-x column {x^) and not 
in the mid-x coliunn (Xc). 

6, Criticize the following ^^definitions^^: 

Qi « Ar/4, Qi « Ar/2, Qz * 3iV/4. 

6. Compute the 3rd decile and the 65th percentile for the distribution of Table 12 (§2.6). 

7. Find the percentile ranks of 120 lb and 200 lb in the distribution of Table 12. Inter- 
pret your answers. 




CHAPTER IV 

THE ARITHMETIC MEAN AND OTHER AVERAGES 

4.1 Various Averages. As already pointed out in |3.1, an average of 
a distribution is a more or less typical value of the variable, used to character- 
ize the location of the distribution as a whole. It is in some sense a central 
value of the distribution, although it need not actually be in the domain of the 
variable. Thus the average number of children in a group of families may be 
2.7, although obviously the number of children in a family cannot be fractional. 

One average, the median, has already been considered, but there are several 
others in common use. The most important by far is the arithmetic mean, 
and a considerable portion of this chapter will be devoted to it. We shall also 
descril^ briefly the mode, the geometric mean, the harmonic mean, and the root 
mean square. As a preliminary we introduce some notation which will be 
useful. 

4.2 Notation for Sums and Products. If x denotes a variable, then 
Xu ^ 2 ,* ‘ *j are symbols for the values which x may take. When we are 
concerned with a sum like the following, 

+ X2 + ^3 + 0:4 + • * • "¥ X + • * • + Jat 

it is customary to designate it by placing the Greek capital letter (sigma) 
before the general term, thus* 

N 

Ylxi = Xi + X2 H b x^-\ h arjv 

»-i 

The symbol X) ^ sort of mathematical verb, and the notation written 
above and below it may be called adverbs. Mathematicians call ^ oper- 

N 

ator and speak of the ^'adverbs” as limits. When ^ is placed before any 

quantity, it means, ^'add up all quantities like* • which are formed by giving 

i the values of every positive integer from f = 1 to f = N, inclusive. Thus 

if X, stands for a value of the variate in Table 2, Xi refers to the first value 76, 

X 2 refers to the second value 80, etc., and refers to the last value 56. Here 

100 

N « 100. Hence the compact notation ^Xi denotes the sum of all the values 

N **1 

in Table 2. The symbol is read, “the summation of x-sub-t, i varj^'ing 

• ■•1 

(or running) from one to A/' The subscript i is called the index of swmmo- 

* The symbols Sx and S(x) may also be used instead of Sx. 
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(ion. Any letter may be used as an index but it is conventional to use i or j. 
Also tJie upper limit may be denoted by any letter, but we shall use JV to 
denote the total number of observed values (some of which may be alike) 
in a set. 

If a variable x is to take on the particular values, 1, 2, 3, etc., instead of 
the general values xi, xj, x«, etc., then x itself becomes the index of summation 
and we write x =» 1 underneath Thus 

N 

Ea; = l+ 2 + 3+ -- -+ JV 

x-l 

Frequently the index of summation is understood from the context and the 
notation at the top and bottom of ■nay be omitted if no ambiguity results. 

Illustrations: 

N 

1* ^^2 4“ • • • + 3xn 

« 1 

== 3(xi + X2 H h xn ). 

6 

2. ^(Xi + c) = (xi + c) + (x 2 + c) + (xa + c) 

*•■1 

+ (X4 + c) 4- (xj + c) 

= (xi + Xj + X3 + X4 + Xf) + 5c. 

4 

3 . T.Xifi = Xjfi + Xa/2 + Xlifz + X4/4. 

# — 1 

4 

~ Xiyi + ^ 2^2 + Xzyi + Xiy^, 

N 

6. « p + 2^ + 32 + • • • + iV". 

Uml 

The following theorems may be proved very simply by writing out the 
summations in full: 

Theorem 1. 21 (^* + 2/» - + Hvi ~ 

»-i f-i f-i »-i 

Theorem 2, If c is a coTistarU, 

N N 

^CXi « (^^Xi 
1-1 ^-1 

N 

Theorem 3* « c + c H |-c « iVc 

t-i 

The next {wo theorems are concerned with the summing of positive integers 
and their squares. 
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Theorem 4. '^x = NiN + l)/2 

*•■1 

This follows from the fact that the sum is an arithmetic progression with 
first term 1 and common difference 1. 

N 

Theorem 6. - N{N + l)(2iV + l)/6 

Proof: Let us take the identity — (x ~ 1)® = — 3x + 1, and sum 

each side for x — 1 to N. Thus, 

- (x — ly] ^ ^[3x2 - 3x + 1] 

x*! X“1 

Applying Theorems 1~3 to the right member we have 

- (x - 1)’] = sZx^ - sEx + N 

X’^l x-»l x—1 

Performing the indicated sum in the left member, we have 



Therefore iV* = 3X^x“ — 3^1^ + N 

Hence, using Theorem 4 and simplifying, 

2N^ + 3A(iV + 1) -2N 
6 

N{N + 1){2N + 1) 

6 

If we wish to denote the product^ instead of the sum, of the values 
Xij 5 : 2 ,* • *, xat, we use the Greek capital letter H (pi), thus 

N 

I Ix f *“ *^ 1 X 2 * * * x^v 
»«*1 

Some simple theorems follow immediately from this definition; for example: 

N N N 

= Xlx.Hyi 

t-1 f-1 f-1 



Theorem 6. 
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Theorem 7. U (cxi) = fix* 

3-1 3*1 

N 

Theorem 8. Tlx = l-2*3* • - AT (which is called “AT factoriar^- 

The product notation is not used as frequently in elementary work as the 
sum notation. 

4.3 Arithmetic Mean. The arithmetic mean of the values Ji, X2, * • • , is 
their sum divided hy N. If we denote the arithmetic mean by f (read as 
'^a:-bar”), 

1 ^ 

(4.1) f = (xi + ^2 H h xn)/N = ^ 

tml 

Thus for the set of grades in Table 2, we find 

X = 7266/100 = 72.66 

Computing the mean* strictly according to definition (4.1) may be called the 
serial method to distinguish it from other methods which will be piesented. 
This definition is used when N is so small that a grouping of the values into 
a fiequency distribution is not desiiable. 

A considerable amount of arithmetic may often be saved by mentally sub- 
tracting a suitable number from all the values of x before adding them. The 
same number is added to the mean after it has been computed. Thus, if 
the values are 171, 173, 175, 181, 189, 196, 197, 200, we subtract 170 from 
each and find the mean of the remainders 1, 3, 5, 11, 19, 26, 27, 30. This is 
122/8 = 15.25. The mean of the original numbers is therefore 185.25. The 
validity of this procedure follows from the relation 

4.4 Weighted Arithmetic Mean. It will be noticed that several of the 

grades given in Table 2 aje alike. For example, 80 occurs seven times. It 
should be evident that the same result would be found for the mean if, instead 
of summing the individual values, each value were first multiplied by the 
frequency with which it occurs and all such products were then added. In 
general, if the values Xi, X2, ■ • • , x* occur with corresponding frequencies 
fi, fk, respectively, where fi + fi -{ [- ft = N, it follows that 

Xlfl + Xifi-\ 1- Xkfk 

/1+/2 +••• + /» 

or, in shorter notation, 

( 4 . 2 ) ^ where N = 2/.- 

N I 1 

* When there is no ambiguity, the arithmetic mean is often referred to as the mean. 
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Here each value Xi is said to be weighted, the weight being the corresponding 
frequency and the arithmetic mean so obtained is called a weighted arith- 
metic mean. The term originated in experimental science where readings are 
sometimes *^weighted'^ according to their estimated reliability. The mean of 
three independent measurements of a quantity may be taken to be three 
times as reliable as a single measurement and given a weight 3, compared 
with weight 1 for a single reading. In this case also, the weight is essentially 
a frequency. 

The ordinary arithmetic mean may be regarded as a weighted mean in 
which all the weights are equal. If the are added individually, the fs 
become unity, and equation (4.2) reduces to (4.1). The student should 

* N 

notice that, for the same data, numerically equal to He should 

1 1 

also observe that N refers to the total number of values in the set (some of 
which may be alike), whereas k refers to the number of different values of x 
in the set and hence to the number of products of the form where is 
the number of times x* occurs. In the following example, N ^ S and fc == 4. 

Example. For the values 6, 8, 7, 6, 5, 7, 6, 6, 

8 

y^Xi ‘*a;i-hX2+*C3 + aJ4 4'^6“biC6 4'X7 4'X8 = 6H-8-4-7 -f'f>4"5-{-7 -l-6-f-5 = 50 



^f%Xi « fiXi + /2X2 fiXi + /4X4 =*2*54'3*6-f2‘7 + 1*8 «50. 2Z/* 

By either method, « 50/8 = 6.26. 

The general formula for the weighted arithmetic mean is 2^ ~ 
where the weights Wi need not be frequencies. 

4.6 Arithmetic Mean of a Grouped Variate. For the purpose of calcu- 
lating the arithmetic mean for a continuous variate grouped in classes, we 
assume that all the items falling in the same class interval have the same 
value of the variate, namely, the class mark for that interval. This is not 
the assumption we make in calculating the median, that is, that the values 
are uniformly spread out over the interval, but as far as the calculation of 
the arithmetic mean is concerned the two assumptions are equivalent.* Our 
mean is therefore a weighted mean, the x< being the class marks and the /♦ 
the class frequencies. Table 14 illustrates the calculation for the data of 
Table 3 (§1.8). 

Since in Table 2 we have the original ungrouped data, we can calculate the 
arithmetic mean exactly. The result is 72.66 and shows that our assumption 

• On the assumption of uniform spread, any value in the upper half-interval is matched 
by a corresponding value in the lower half-interval at the same distance from the class mark. 
The sum of these two is just twice the class mark, and therefore is the same as the sum of 
two values each equal to the class mark. 
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Table 14 


Class 

Limits 

Class Mark 

Frequency 

i 

Product 

S^c 

30-39 1 

34.5 

2 

69.0 

40-49 

44.5 

3 

133 5 

60-69 

54 5 

11 

599.5 

60-69 

64.5 

20 

1290.0 

70-79 

74.5 

32 

2384.0 

80-89 

84 5 

25 

2112 5 

90-99 

94 5 

7 

661.5 

Totals 

i 

E/ = 100 

E/*. “ 7250.0 


7250 

loo" 


72.60 


in this example causes little error. With fairly large samples (at least two 
or three hundred), for which the class frequencies tail off gradually at both 
ends of the distribution, the ‘^grouping error of the mean caused by mak- 
ing this assumption is hardly ever serious, unless the class intervals are 
quite broad. 

The calculation of the mean may be considerably simplified arithmetically 
by a suitable change of the variate values. This change is equivalent geo- 
metrically to shifting the origin of reference and at the same time altering 
the scale. 

When a frequency distribution is represented by a graph, we have seen in 
Chapter II that the variate values are used as abscissas or measurements 
along the x-axis. The mean is therefore the point on the x-axis whose coordi- 
nates are (J, 0). Its position may be emphasized by drawing a vertical line 
through this point, but it is the horizontal distance of the point from the 
origin, and not the vertical line, which represents grapliically the mean. 

If new axes x'y% are taken parallel to the old axes, xy, with positive direc- 
tions preserved, the axes are said to be translated from one position to the 
other. A translation of axes corresponds to a transformation of coordinates. 
Thus if we let 

X' = X - xo, y' y - yo 

the origin is translated to the point (xo, yo). Since our variate is denoted by 
X we are concerned here only with the transformation x' = x ~ Xo which 
translates the origin to the point (xo, 0). The new values x' are often called 
deviations. The units of measurement retnain unchanged. Obviously, any 
values that are larger than xo will be positive in terms of x' and any values 
smaller than xo will be negative in terms of x'. 
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We can now alter the scale by letting c units of the variate a;' ( = x — a^) 
equal 1 unit of a new variate u. The numl:>er expressing an observation in 
terms of u will therefore be l/c times as large as it would be in terms of x*, 
so that*, if c 0, 

(4.3) w = (x — a;o)/c 
The relation between x and u is expressed by 

Theorem 9. 

(4.4) X = cw + iCo 

Proof: By (4.3), x = ca + xq. Substituting in (4.2) we have 

The first term is cH, by definition, and the second is xo, since The 

theorem follows. " 

Returning to Table 14, we see that if we choose xo = 64.5 (the class mark 
of the middle class) and choose c = 10 (the class interval) the numbers 
u ^ (x — 64.5)/10 become simple integers arranged in order. The products 
fu are easily multiplied mentally, and then x is given by (4.4). The pro- 
cedure is illustrated in Table 14, which shows the so-called ‘Tully coded” 
method of calculating x. (The variate is coded to a new scale.) It is simply 


Table 15, Mean of 100 Grades, with Class Interval as Unit 



u 

f 

fu 

34.5 

-3 

2 

~6 

44.5 

-2 

3 

~6 

54.5 

-1 

11 

-11 

64.5 

0 

20 

0 

74.5 

1 

32 

32 

84 5 

2 

25 

50 

94.5 

3 

7 

21 



100 

80 


u - 80/100 = 0.8 
r =» 64.5 -f lOw = 72.5 


a device to save arithmetic. Note that any of the class marks may be chosen 
as the value of Xo, but it is convenient to choose one in the region of the higher 

* This is the same kind of relation as that between the centigrade and Fahrenheit scales 
of temperature, namely, C (f )(F — 32). Here the scale-factor c is f . 
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frequencies. This insures that the larger numbers in the / column will be 
multiplied by numerically small values of u. 

If the class intervals are unequal, the full benefit of coding cannot be ob~ 
tained, but a suitable choice of xa and c will usually considerably simplify the 
numbers which have to be multiplied. Thus in Table 16 (fictitious data), we 
may take Xo = 74.5, c == 5, giving the u values shown in the fourth column. 


Table 16. Coded Calculation of Mean with Unequal Intekvals 


Class Limits 

/ 

Xc 

u 

fu 

10~ 19 

1 

14.5 

-12 

-12 

20- 29 

5 

24 5 

-10 

-60 

30- 39 

12 

34.5 

-8 

-96 

40- 49 

25 

44.6 

-6 

-150 

5C- 99 

47 

74 5 

0 

0 

100-199 

65 

149.5 j 

15 

826 

200-499 

46 

349.5 

55 

2530 

600-999 

9 

749 5 

135 

1215 


200 



4262 


« 21.31 

T « 74.6 -f 106.55 « 181 


4.6 Mean of Means. One of the great advantages of the mean as an 
average is its amenability to mathematical treatment. For example, we 
shall prove a theorem connecting the mean of two sets of values, when com- 
bined into a single set, with the means of the two sets taken separately. In 
order to be able to generalize to more numerous subsets, it will be convenient 
to extend the subscript notation, and use Xi, X 2 , • * • to mean different variates 
and not different values of a single variate. Then we can use a second sub- 
script for the values. If the variate xi has rii observed values, we can denote 
these by 

^12, * ' ' ) ^^Inj 

and similarly the th values of X 2 are denoted by 


X21, X22, * * * I X2?»j 

(xu is read “x one one” and not '^x eleven”, etc.) 
The mean of the first set is 


(4.6) 

and the mean of the second set is 
(4.6) ^2 


ni.-i 


-£ 


fh 


,X2i 
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Theorem 10, The mean of the combined sefe, consisting of ni values of xi 
and nt values of X 2 , is 

(4,7) X = [ni^i + N ^ ni + nt 


Proof: From (4.5) and (4.6) 

UiXi + n2l2 = Z^Xu + ^X2j 


If we denote '^either Xi or x^^ by or, x can take on ni + values, namely, the 
Ui values of Xi and the values of J 2 , and the sum of these may be denoted 

by ^Xk, the values from /c = 1 to rii being values of Xi and the rest being values 

k^i 

of x%. If ni + ^2 = A^, the combined mean is 


5 = 





niXx + y?2^2 
Ui + 712 


This result may be generalized as: 

Theorem 11. The mean of a set of N values which is composed of k subsetSj 
the frequency in the ith subset being t?,, is 

(4.8) ^ = Tt ^ = S”. 


Corollary. 

reduces to 

(4.9) 


If ^ n is the same for all the sets, then N ^ kn and (4.8) 



4.7 The Mode. That value of the variable which occurs most frequently 
in a distribution is called the mode. In a sense it is the * ‘fashionable” value 
(one meaning of “mode” is fashion) and is the kind of average meant in such 
a phrase as “the average man.” If a unimodal distribution can be fitted 
with a smooth frequency curve, the mode is defined as the abscissa of the 
highest point on this curve. For the kinds of curves usuall}'' fitted, this point 
can be calculated mathematically, but the calculation requires more advanced 
mathematics than simple algebra. We shall denote the mode by f or some- 
times by Mq. 

In a given frequency distribution for a discrete variate, the mode can be 
immediately picked out by inspection. Thus for Table 10 (§2.3) the mode is 
2, since the frequency is greater for a: == 2 than for any other value of x. The 
difficulty of calculation arises only with a continuous variate. We can easily 
pick out the modal class, which for a distribution with equal class intervals is 
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the class having the greatest frequency,* but the position of the mode within 
the class is harder to fix. 

Sometimes the class mark of the 
modal class is used, but this is a 
poor approximation unless the his- 
togram is almost symmetrical. A 
somewhat better method is illustrated 
in Fig. 18, applicable when the class 
intervals in the neighborhood of the 
mode are all equal. 

The method uses three adjacent 
rectangles of the histogram, with the 
tallest in the middle. The mode is 
the abscissa of the point M at which 
AB and CD intersect. It can be 
shown that this is also the abscissa 
of the vertex of a parabola passing 
through the three points, P, Q, R 
which are the midpoints of the tops 
of these three rectangles If /_i, 

/o, /i are the frequencies represented by the three rectangles, c is the class in- 
terval, and xq is the class mark of the modal class, the formula for calculating 
the mode is 



-IK -1 -K 
Fig 18 


0Af^,K 1 IK — 
Approximate Mode 


(4.10) 


^ = ^0 + ~ - 


/l /- 


2 2/o -h- /-I 

To show this let us change the variate to w = (x — X{i)/c, Then the centers 
of the three class intervals are w = ~ 1 , 0 , 1 , respectively. If the abscissa 
of the mode is it is evident from the figure that LM = t 2 + J, MN = J — t2. 

LM CA 

From the geometry of the similar triangles ACM. BDM, we have 77 -; =* : 

MN BD 

But since tlie heights of the rectangles are proportional to the frequencies, 

CA 

BD / 0-/1 

Hence, ^L±I 

/0-/1 

Clearing of fractions and collecting terms, we have 

Finally we return to the x variate by writing f = xo + ct 2 , which gives (4.10). 

* With unequal intervals, the modal class is that with the greatest frequency per unit of 
In Table 16, it is the class 40-49. 
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As an illustration we may take Table 12, §2.5, where a?© « 134.75, c = 10, 
A « 196, /o - 248, /i « 197. 

By substitution in (4.10) we find ± = 134.75 + 5(1/103) * 134.8 so that 
the approximate mode is 134.8 lb. It happens in this case that £ is practically 
equal to the class mark, since the two frequencies adjacent to the modal class 
are almost exactly equal. 

Sometimes a distribution has two modes, as illustrated in Fig. 19, which 
represents the distribution of number of petals in flowers of a certain species 
of chrysanthemum. There are clearly two humps (disregarding small irregu- 
larities), one near x = 23 and the other near x = 33. A distribution like 
this is often regarded as evidence of heterogeneity in the population. Thus 
if a frequency curve were drawn for heights of a large sample of adult males, 
about half American and half Japanese, the curve would similarly be bimodal. 
It would really be a superposition of two unimodal frequency curves with 
different modes, the mode for the Americans being higher than that for the 



Fig. 19. Number op Petals in Chrysanthemum leucanthemum 

Japanese. Possibly in the distribution of Fig. 19 we have a mixture of 
two varieties of the same species, the one tending to have more petals than 
the other, although in both varieties the number of petals is variable to 
some extent. 

4.8 Relation Between Mean, Median, and Mode. If a distribution is 
represented by a histogram, an ordinate through the median divides the area 
into two equal parts. An ordinate through the mean passes through the 
centroid of the area; that is to say, if the histogram were cut out of a thin 
homogeneous metal plate and held in a horizontal plane it would balance 
about a knife-edge along this ordinate. As already pointed out, an ordinate 
through the mode (if there is only one mode) passes through the highest point 
of the frequency curve which fits the distribution. 
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Fig. 20 shows the position of the three averages in a moderately skew dis- 
tribution. if the distribution were perfectly symmetrical then all three of 
these ineasuies of location would coincide. 


There is an in Unresting empirical relation- 
ship between the three quantities which 
appears to hold for uniniodal curves of 
moderate asymmetry, namely, 

mean ~ mode = 3 (mean — median) 

It IS a useful mnemonic to observe that 
the mean, median, and mode occur m tlie 
same order (or leverse order) as in tiie di(- 
tionary; and tliat the median is nearei to the 
mean than to the mode, just as the (“orrespond- 
mg words are nearer togetlicr in t he did lonary * 



Fig. 20 


4.9 Relative Merits of Mean, Median, and Mode. The stiulent primarily 
interested in the use oi these averages in praetieal statistics might reasonably 
inquire, ‘Which of the three averages mentioned should be used in a given 
problem?^^ The answer depends upon certain [properties peculiar to each 
average and upon the nature of the data to be averaged. 

The qualities desired in an average may l>e listed as follows. Tlic average 
should be: 

(a) rigorously defined, 

(b) easily computed, 

(c) capable of a simple interpretation, 

(d) dependent on all the observed values, 

(e) not unduly infiueiiced by one or two (‘\trem(‘l> largo or small values, 

(/) likely to fluctuate relatively little Irom one lundtun sample to another 
(of the same size and Irom the same population;, 

(g) capable of mathematical manijmlation. 


These qualities are satisfied in varying degrees liy tin* averages already de- 
scribed. The most important ones from the standpoint of the ordinary man 
are (6) and (c), and here the median laUs Ingh, but the most important from 
the point of view of matliemathal statistics are (J) and (g). Unfortunately 
the student at tlxis stage will find it hard to aiipreciate why this should be 
so, and will have to take it on trust that the arithmetic mean is distinctly 
superior in these two, and m most other, respects. A glimpse of what is meant 
by (g) was given by Tlieorems 10 and 11, No such simple lesults as these 
can be statqd for the median or the inode. 

• M. G. Kendall, The Advanced Theoiy of i^talihUosy Lippincott, vol. I, p. 35. 



54 


Arithmetic Means and Other Averages 


IV 


The median has an advantage over the mean in three situations: 

(1) When there is an open class at one or both ends of the distribution (as 
in Table 6, §1.8) the arithmetic mean cannot be calculated but the median can. 

(2) When exceptionally large or small values occur at the ends of the 
distiibution, the median may be much more ''typicaF^ than the mean. Thus, 
the median of 41, 43, 46, 48, 49, 52 and 141 is 48 but the mean is 60. 

(3) When the observations cannot be measured numerically but can be 
ranked in order, the middle one (or the two middle ones) can be readily picked 
out, but the other averages have no meaning. 

The mode satisfies condition (c) above, but has little meaning unless the 
sample is large. It is the appropriate average if we want the “usual” value, 
the one “most in demand,” as in some questions of marketing. It is not, 
however, easy to compute. 

If we are primarily interested in estimating the average of a parent popula- 
tion from the average of a sample (an important practical problem in sta- 
tistics) the mean is usually the proper average to use, as it is more efficient* 
than any other in the sense suggested by condition (/) above when the popu- 
lation approximates the “normal” type described in Chapter VIII. 

4.10 Geometric Mean. The geometric mean of a set of N positive values 
is the positive Nth root of their product. For the N values, xi, X 2 ,* • *, xn, 

(4.11) Ma = [xi*X2* • 
or 

(4.12) {Mor - hxi 

Thus the geometric mean of 4 and 9 is 6, which is the positive square root 
of 36. If any of the are zeio, Mq is zero, and if one of them is negative 
Ma may be imaginary. The geometric mean is never actually used except 
for positive numbers. 

The simplest way of calculating the geometric mean, usually, is to use 
logarithms . From (4.11) 

(4.13) log Ma = [log xi + log X 2 H h log Xi^]/N 

or 

(4.14) N log Ma = X) Jog x* 

Therefore the logarithm of the geometric mean is the aiithmetic mean of 
the logarithms of the values themselves. 

*See Reference 2 at the end of the chapter. • 
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Emmple. Find the geometric mean of 7.90, 13.82, 22.96, 35.34. 

SoltiMon. 

log 7.96 = 0.90091 
log 13.82 *= 1.14051 
log 22.95 * 1.36078 
log 35.34 == 1 54827 
4|4.95()47 
log Mo 1 .23762 
Mo == 17.28 

The geometric mean may not satisfy condition (c) of §4.9, but it is better 
than the arithmetic mean as regards (c), and, unlike the median, it does de- 
pend on all the observations. Thus the A.M. of 100, 100, 100, and 1000 is 
325 but the G.M. is 177.8. J'or tliis reason the geometric mean is often pre- 
ferred to the arithmetic mean in averaging bacterial counts on agar plates 
for the purpose of judging milk samples. The bacterial population may 
easily jump to a v('ry high value in the occasional sample, and the geometric 
mean is considered the more typical average. 

Sometimes when a frequency distribution is graphed it is seen to be quite 
skew, descending fairly rapidly to the x-axis on the left but stretching out in 
a long tail to the right. In such a case it often happens that when log x is 
used as the variate instead of x, the distribution appears much more nearly 
symmetrical.* The arithmetic mean of the new variates (which is the log- 
arithm of the geometric mean of the old variates) would be appropriate for 
this type of distribution. 

The geometric mean is the natural average for ratios. Suppose we want to 
compare two values of the ratio of x and y (say the net worth and the debt 
for two firms), we get an equivalent result by using the geometric mean 
whether we consider the ratio x/y or the ratio yjx. This is not true for the 
arithmetic mean. For example in the table 


z 

y 

^/y 

yh 

60 

20 

2.5 

0.4 

10 

8 

1 25 

0.8 


the A.M. of xjy is 1.875 and that of yjx is 0.6, which is not the reciprocal of 
1.875. The G.M. of xjy is, however, 1.77 and that of yjx is 0.565, which is 
the reciprocal of 1.77. Since we feel that the average ought not tr depend on 
the particular way we choose to express the ratio, it seems reasonable to use 
the geometric mean. Cost-of-living and other index numbers are weighted 
averages of ratios and here, too, the geometiic mean is often used (Chapter V). 

4.11 Weighted Geometric Mean. If the observations Xi, X 2 , • • • , x* have 
weights /i, / 2 , * * • , /*, the weighted geometric mean is given by 

* Several examplee are given in an article by J. H. Gaddimi, Naturct 166 , 1945, 
1 ^. 463 - 466 . 
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(4.15) (Mo)^ = • -x/s JV = /i +/*+••• + /* 

or 

k k 

(4.16) iVlogJlfo = 2]/.logx„ N = XI/* 

This arises in finding the average rate of compound interest over a period of 
years in which a sum of money has accumulated partly at one rate and partly 
at another. If a sum %P is invested for rii years at 7i% and then the amount 
to which it has accumulated is invested for years at / 2 %, the accumulated 
amount $A is given by 

^ = p(i + hra + 12^ 

where ii = /i/lOO, h = J 2 /IOO. At the rate 1% for the whole period 
A - P(1 + i = 7/100 

and by equating the two values of A we see that 1 + z is the weighted geo- 
metric mean of 1 + ii and 1 + Z 2 . 

If the same problem is worked at simple interest, we find that i is the 
weighted arithmetic mean of ii and U. ^ 

4.12 The Law of Growth. The amount of a sum of money at a fixed 
rate of interest is one example cf a quantity which increases according to an 
exponential law, 

(4.17) y = ar* 

The terms corresponding to x = 1, 2, 3* • • form a geometric progression with 
conunon ratio r. In the compound interest illustration r is equal to 1 + 
The same law is followed by any quantity which increases at a fixed rate 
proportionally to itself. Since such a rate of increase is characteristic of 
biological populations with abundant food supplies and no overcrowding 
(for example, a bacterial culture in its early stages, or the population of a 
country with a rapidly expanding frontier), this exponential law has been 
called the law of growth. 

If we wish to average a set of values taken from a population following 
approximately such a law, the geometric mean is the proper one to take. 
Given that the population of a city was 98,000 in 1940 and 129,000 in 1950, 
and with no further information, having to estimate the population in 1945, 
the best estimate would be (98,000 X 129,000)^ == 112,000. The average 
annual rate of increase is obtained from (4.17) by solving the equation 

129,000 == 98,OOOr'o 

which gives r = 1.047. The average annual rate of increase is therefore 4.7%. 
The population for any other year can then be estimated, but great caution 
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should be exercised about extrapolation outside the decade given, as it can 
seldom be assumed that the conditions governing the growth of a city remain 
unaltered for long periods. 

1.13 Harmonic Mean. Another average which hsvs long been known 
and which is required in certain problems is the harmonic mean (Mir). For 
the N positive values Xi, a: 2 * * •, XN) it is defined as the reciprocal of the arith- 
metic mean of the reciprocals of the values. In symbols, 


(418) 


Mh 


1 /I I 

/ I 1 .. 

N \xi X 2 


■+-) 

xn/ 



This measure is used in averaging ratios, such as rates and prices, when certain 
conditions are agreed upon. 

Many ratios can be expressed either as x/y or as y/x. The price of eggs 
can be put as so many cents per dozen or so many eggs for a dollar. If 100 
bushels of wheat are exchanged for 175 dollars, the p)rice of wheat is 175/100 
dollars per bushel or 100/175 bushels per dollar. The correct average of 
such prices will be the arithmetic mean if the unit of the denominator is re- 
garded as fixed and the numerator as variable; but it will be the harmonic 
mean if the unit of the numerator is regarded as fixed and the denominator 
as variable. 

Suppose we wish to average k ratios r, == xjy^ii == 1,2, •'*,/?). The 
average is the total amount of x divided by the total amount of i/, that is, 
First, let us regard the unit of y as fixed and express all the 
ratios with a common ?/, equal to v. Then x, == r^y^ = ViV, Hence, 

= v^rjh) = ij^r,)/k = f 

which Is the arithmetic mean of the r,. If, however, we regard the unit of 
X as fixed, y^ = x»/r„ and all the x, are equal, with a common value u. Then 

which is the harmonic mean of the 


Example 1. A tourist purchases gasoline at three filling stations, where tne prices are 
33|^, 26, and 20 cents per gallon. What is the average price? 

SoliUion. If we regard a gallon as the fixed unit, the average is the arithmetic mean, 
26.1 cents/gallon, but if we regard a dollar as the fixed unit, the average is the harmonic 
mean 3/(3/100 -j- 1/25 H- 1/20) ~ 26 cents/gallon. The former would be the correct 
average price actually paid by the tourist if he bought the same number of gallons at each 
station. The fetter would be the correct average if he spent the same sum (say $1) at each 
station. It corresponds to the arithmetic mean of the prices expressed as gallons per dollar, 
namely, 3, 4, and 5 gallons/dollar. 
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Example In a certain factory a unit of work is completed by A in 4 minutes, by B in 
fi minutes, by C in 6 minutes, by D m 10 minutes, and by E in 12 minutes. What is their 
average rate of working? At this rate h.ow many units will they complete in a 6-hour day? 

SolxUion. The rates are here expreseed as times per unit of vsork. If we regard the out- 
put of work per unit of time as the important consideration, the harmonic mean should 
be used. 

Mn - 5/(i + i + 300/48 - 6i 

That is, the average rate of working is 6-J- minutes per unit. In 360 minutes the five men 
together will complete 5 X 360/6-J- = 288 units of work. This result can be obtained also 
by considering that the men separately will do 90, 72, 60, 36, and 30 unitsjn 360 minutes, 
and the sum of these is 288 units. The use of the harmonic mean is therefore justified. 

It has been pointed out* that in computing the velocity of a fireball from 
the path length s and the duration of flight by the equation v == s/t, the 
harmonic mean of a number of observations is more natural than the arith- 
metic mean. This is because s is much better determined than t. The 
path length is computed by triangulation from the observed positions of 
beginning and ending, which with experienced observers are fairly well de- 
fined. The time is largely a matter of retroactive estimation, and the begin- 
ning is difficult to observe accurately because it comes as a surprise. Hence 
the error is mainly in t, and we are averaging times per unit of distance rather 
than distances per unit of time. 

4.14 Relation of Arithmetic, Geometric, and Harmonic Means. The 

geometric mean occupies a middle-of-the-road position between the arithmetic 
and harmonic means. It is not as sensitive as the A.M. to a few’ very high 
values or as sensitive as the H.M. to a few very low ones. Also, for any set 
of positive numbers, 


AM. > G.M. > HM. 

This is easily proved for two numbers, say a and h. If a = 6 the three means 
are all equal, so that we may take a > b. Denoting the means temporarily 
by G, Hj we have 

A = (a -f b)/2, G = \/^, // = 2 ah/{a b) 

Now (\/a — y/by > 0, so that, squaring out, 

a + b — 2y/^ > 0 

whence 

(4.19) (a + b)/2 > Va6 

or 

A>G 

• W. J. Fisher, Harvard College Observatory Circular No. 876, 1932* 
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Again, on multiplying (4.19) through by 2y/^/{a + 6), we obtain 

-s/a6 > 2ab/ (a + h) 
or 

G> H 

Hence for any two positive numbers a and b, 

A>G>H 

(As a mnemonic, this is the order of the letters in the alphabet.) 

It is a little more difficult to prove the relation for iV numbers, Xi, a; 2 , * * * 

By definition, 

A = (xi + H h xn)/N 

G^ = Xi‘X2- • ‘Xfi 

N/H = 1/xi + l/x2 + * * * + 1/xjsr 

If all the x, are equal, A — G = H, If not, take the least (xj) and the great- 
est (Xk) and replace each by (j, + Xjb)/2. (There may, of course, be several 
numbers all equal to x, or x*; we take any one of them.) This process leaves 
A unaltered but changes G to (?i, replacing XjXk in G by (x, + a^*)V4, which 
by (4.19) is greater than x^x*. Hence Gi > G. Also the numbers are 
now nearer to each other than they were before. We continue this proc- 
ess as long as any x, remain unequal, obtaining a sequence of values 
G < Gi < G 2 < Gz < • * and at each step the range of the N numbers is 
equal to or less than before. The limit of the process is a set of N numbers 
all equal to each other and to A . Therefore G < A. 

The process may terminate in a finite number of steps or the limit may 
never actually be reached. In the latter case, however, we can continue as 
long as we like. Suppose Gi — (? = d, which is positive. Let us continue 
until the difference between the greatest and least of the N numbers is less 
than d. The difference between the geometric mean (say (?«) and the arith- 
metic mean {A ) of these numbers will certainly be less than d, and Gn -- G > d. 
Hence G < A. 

A similar argument may be used to prove that G > H. We replace 1/x, 
and 1/Xfc by (1/x, + l/x*)/2. The details are left to the student. The 
proof is not important in itself, but it provides drill in careful reasoning. 

4*16 Root Mean Square. Another average that is sometimes used is 
the root mean square (R.M.S.) defined as the positive square root of the mean 
of the squares of the values x„ that is, 

(4.20) R.M.S. = = (i 'Zx^ 

The R.M.S. can be used when the x, are sometimes positive and sometimes 
negative. It is, for example, the average used for an alternating eleotrio 
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current. If values were measured for an alternating current at many instants 
throughout a time covering many oscillations, the arithmetic mean would 
be near zero. The effective current from the point of view of the consumer is 
the root mean square current. 

In theoretical statistics the chief application of the root mean square is 
the measurement of deviations of the values a*, of a variate from the mean 2. 
The R.M.S. of these deviations is called the standard deviation and is by far 
the most important measure of dispersion (Chapter VI). 


Exercises 


1 . 


Write m exj)anded forni: 


(a) 







1 


(e) 


2. Write in abbreviated notation: 


N 

-f c) 

♦-1 


(a) Ji/i 4- Xj /2 -h . . . -f Xkfk 

(b) ~ [(xi ~ x'2/i 4- (X2 - xy/i 4* 


(b) 


k 

XICx* - x)fi 


j-i 


(d) 

ni+l 


+ {Xk - 


3. Prove: 


(c) (a4-l)(a4~2;(a 4-3)...(a4-iV) 

(d) (Zo 4- 4- 4* . • ^ -h a«x~ 

N N 

(a) ]^(x» - c) «« -> Nc 

I, k k 

(b) Zc*. + 1)’/. = + 2^®^. + N 

1 1 1 


n n 

(c) I]x(x ~ !)?> = Xlx(x - 1)7? 

x-0 ic»2 

4. For the example of §4.4, compute X](x» — x)/*, using the following form: 

I 

Xt /. (x.~x) (x< - x)/. 


6. 


6 

6 

7 

8 


N 

Show by writing out the sums that 53x,i/» is not the same as 



6. Use the identity x* — (x — 1)* = 2x — 1 to give a proof of Theorem 4 on the lines of 
the proof given for Theorem 6. 

7. Show that the arithmetic mean of the first N integers is (N 4- l)/2. 

8. Rewrite Table 16, §4.6, using xo = 74.5, and verify that x is unaltered. 

9. Find the arithmetic mean of 18, 42, 23, 16, 103, 61, 49, 96, 113, 10, using (4.1). 

Find the deviations of each of these numbers from 60 and verify that the mean of these 
deviations, plus 60, gives the mean of the numbers themselves. 
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iO« Compute by the fully coded method of §4*^ the arithmetic mean of the weights of 
1000 stud^ts (Table 12, $2.5). Am. 138.65 lb. 

11. From Table 5, {1.8, find the mean monthly rainfall at Iowa City for the 36 years given. 

IS. Find the mean of the distribution of the discrete variate in Table 11 ($2.3). Am. 
X - 6.139. 

18. The mean grade of one class of 20 students is 66% and that of another class of 
15 students is 70%. Find the mean grade of the two classes taken together. 

14. Calculate the mean income of Canadian taxpayers from the data of Table 6, {1.8. 
(Note that this cannot be done without further information because of the open classes at 
the ends. Assume for the purpose of this question that the first class starts at 81000 and 
that the last class ends at 8250,000.) 

18. Find the mean of the following distribution: 


X 

/ 

47.5 

7 

48.1 

17 

46.9 

46 

44.0 

44 

40.7 

64 

41.6 

43 

38.0 

35 

33.2 

14 


16. In chemistry a student was graded 86 in class work, 80 in laboratory, and 65 in final 
examination. If these were weighted 1, 2, and 3 respectively, what was the student’s aver- 
age grade? 

17. The population of a city increased in 5 years from 226,000 to 246,000. What was 
the average annual rate of increase, assuming that the ^‘law of growth” appUed? 

18. The number of bacteria in a certain culture was found to be 4 X 10® at noon of one 
day. At noon the next day the number was 9X10®. If the number increased at a con- 
stant rate per hour, how many bacteria were there at the intervening midnight? 

19. For five successive years the rates of interest on money were 4.26, 5.30, 4.65, 3.86, 
and 4.38%. What was the average rate of interest? (Use the G.M. of 1 -ft.) 

20. Show that if a sum of money IP accumulates at simple interest for rii years at Ii% 
and for nt years at 7^%, the average rate of interest is {nJi + n 2 ^ 2 )/(ni + n 2 ). (Cf. §4.11.) 

21. The following table gives the population of the United States at each 10-year census 
from 1860 to 1940. {Historical Statistics of the United States, 1949.) 


Year 

] 

Population 

(Millions) 

Ratio to Preceding 

Figure 

1860 

31.4 


70 

39.8 

1.27 

80 

50.2 

1.26 

90 

62.9 

1.25 

1900 

76.0 

1.21 

10 

92.0 

1.21 

20 

106.7 

1.15 

30 

122.8 

1.16 

40 

131.7 

1.07 
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What is the average rate of increase per decade? Estimate the population of 1960 from the 
1040 figure. Ana. 1 .20, 156.7 millions. (The true value in 1960 was 150.7 millions.) 

22. Given two sets, each of n positive values, Xn, «i 2 , . • • Xm, X 21 , X 12 , . • . X4», prove 
that the geometric mean of the ratios of corresponding values in the two sets is equal to the 
ratio of the geometric means of the tw’o sets, 

23. If Xi and Xt are two positive value.s of a variate, prove that their geometric mean is 
equal to the geometric mean of their arithmetic and harmonic means. 

24 . {Amer. Math. Monthly ^ 42 , 1935, p. 394) Show that if 2a is the harmonic mean of 
two rational numbers b and c, then the sum of the squares of the three numbers a, 5, c is the 
square of a rational number. 

26. Find to four significant figures the arithmetic, geometric, and harmonic means of 
the first 15 positive integers. Verify the relationship A'> G II. 

26. (Burgess). Twenty boats make 6 transatlantic trips each per year, and 10 boats 
make 4 trips each per year. W hat is the average number of clays for a “turn around" 
(that IS, time between consecutive departures from the same port}? Take the year as 
360 days for convenience. 

IhrU. The rates may be expressed either as trips per year (6 or 4\ or as days per trip 
(60 or 90), and the time unit may be legardcd as fixed. The first method requires the arith- 
metic mean and the second the harmonic mean. Both give tlie same result, 5^ trips per 
year or 67.5 days per trip. 

27. A plane travels one-half of a given distance, D miles, at a speed of x\ miles per hour, 
and the remaining half distance at a speed of Xi miles per hour Show that the average 
speed for the entire distance is the harmonic mean of and x^. (If Ji and X 2 are rates for 
going and returning, half the average speed is the “radius of action per hour," that is, the 
distance that the plane could travel and return m one hour.) 

28 . Three boys work correctly 7, 10, and 15 arithmetic problems during a half-hour test. 
Assuming that the problems are of about the same difficulty, what is the average rate at 
which the hoys work? 

TlmL Is the significant measure of speed of working (a) the time a boy takes to work 
one problem or (b) tlie number of problems he can work in a given time? If you prefer 
(a), use the harmonic mean, if (b) the arithmetic mean. Authorities differ as to which is 
t/he more reasonable interpret ation. 

29. Write out the details of the proof in §4.14 that G > // for AT numbers. 

30. With a large amount of data, the calculation of the geometric mean may be simplified 
by coding, as is done for the arithmetic mean. Show that if we choose class intervals with 
a constant ratio of the upper to the lower boundary, and form a frequency table, the coded 
method of §4.5 applies, with 

c , 

log ilfo = Xo -f ^ 

when Xo is midway between the logarithms of the upper and lower boundaries of the selected 
class interval, and c is the logarithm of the ratio between these boundaries. (See Refer- 
ence 3.) 

31. Apply the method of Exercise 30 to the data of Table 2, §1 .8. (Here the highest grade 
is 98 and the lowest 34, with a ratio of 2.88. If we want 7 classes the ratio of class boundaries 
should be (2.8^^^ *= 1.16. If we take the lowest class boundary arbitrarily as 32.0, the 
succeeding ones are 37.1, 43.1, 49.9, 57.9, 67.2, 77.9, 90.4, approximately.) Form the 
frequency distnbution and complete the calculation of the geometric mean. 
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88« Prove that for any set of numbers Xi, xj, . . . , x^r the arithmetic mean is not greater 
than the root mean square. 

Hint. Prove this hrst for tioo numbers. 
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CHAPTER V 

INDEX NUMBERS 

5.1 Index Numbers as Weighted Averages. The index number is a 
widely used statistical device for comparing one group of related variables 
with another group. We may wish to compare the prices of many different 
articles of food at one date with the prices of the same articles at a different 
date, and to express the comparison by a single food-price index number. Or 
we may compare the production of a group of farm crops in one country with 
that of the same crops in another country, obtaining a single crop-production 
index for the one country as compared with the other. Or, again, we may 
compare the scores of two individuals on a battery of tests, expressing their 
relative scores by a single number. All of these are examples of index numberSf 
and it is evident that they are really averages of ratios. The many different 
price ratios for beef, pork, bread, sugar, etc., for example, are averaged to 
give the single price-index number for food. Moreover, the average must be 
a weighted average, because not all the items included in the group are equally 
important. In the construction of a food-index number it would clearly be 
unreasonable to give pepper and salt the same weight as bread and beef. 

In order to compute an index number it is necessary to collect a mass of 
data on prices, production, scores, or whatever is being compared. But it is 
also necessary to decide on the type of average to be used and the relative 
weighting of the different items in the group, and here there is room for con- 
siderable differences of opinion. Arithmetic, geometric, and harmonic means 
all have their advocates, and several different methods of weighting have 
been suggested. We shall describe some of these, as illustrations of the 
various averages described in Chapter IV, but we shall not enter into the 
important practical details of how the data are actually obtained, how accu- 
rate they are, or what governs the choice of items that make up the group. 

A related set of index mmibers, such as a series of price-index numbers 
computed over a period of months or years, is called an index. The most 
familiar example is the cost-of-living index, computed monthly by the United 
States, Canadian, and other governments. 

6.2 Price Index Numbers. Let us suppose that we wish to compute a 
price index for nonferrous metals in the United States over a number of years, 
and that we have the data from official sources on the prices of copper, zinc, 
lead, aluminum, etc., over the period studied. We must choose a base year 
such as 1939 with which the other years are to be compared, and we will call 
this the year 0. Let the price of the copper be in the base year and 
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in the year n, which is one of the years for which we want the index number. 
The quantity is called a price relative. It expresses the price in 

year n as a fraction of the price in year 0, and it is usually multiplied by 100 
to make it a percentage. Similarly the price relative for zinc would be 
^i*^d so for all the N metals making up the selected group of non- 
ferrous metals. A simple arithmetic mean of these price relatives would 
give a price index number, but such a procedure would give equal importance 
to all the metals in the list. Instead it is customary to weight the price rela- 
tives according to the value of metal produced, either in the base year or in 
the current year, in both cases using base year prices. 

Let be the quantity of copper produced in year 0, the quantity 
produced in year n, and so on for the other metals. Then 
will be the value of the copper produced in year 0, and *= 
that of the production in year n at the price of year 0. Thus if po^^^ is in dollars 
per ton and is in tons, "will be in dollars. Laspeyres^ index num- 

ber uses as the weight for the ^th item (that is, the value in the base year) 
and Paasche^s index number uses (the value in the current year at base 
year prices). 


(5.1) 

— * 


1 

1 

M 

(5.2) 

II 

-M 

II 

M M 


The symbol P is used since both these index numbers refer to prices. The 
subscripts L and P refer to Laspeyres and Paasche. 

The advantage of (5.1) is that once the weights have been determined they 
stay fixed, and it is only the prices that change from year to year. In (5.2) 
the weights have to be computed afresh for each value of n. It may well be 
that if conditions of production are changing, the current quantities give a 
more realistic picture than the base-year quantities, but, on the other hand, 
the base year is usually selected as a typical year when conditions could be 
regarded as approximately normal, and the use of base-year quantities gives 
more stability. Sometimes an average over several years is used as a base. 
The Canadian cost-of-living index was based on an average of the years 1935 
to 1939.’’' The number Pl measures the change in cost, due to change in 
prices, c^f a fixed set of commodities in fixed amounts. It tends to be too high 
as an index of prices because, in a market which is to some extent free, con- 
sumers will tend to reduce their purchases of more expensive items and in- 
crease those* of less expensive ones. The number Pp measures the change in 

* In 1952 a new consumer price index, based on 1949, was instituted. 
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cost of tJie quantities actually purchased from what the same quantities would 
have cost had they been purchased in the base year. The same argument 
about a changing market shows that the Paasche index tends to be too law^ 
because the ratio is always in the direction current year to base year, and the 
current consumption might overestimate the total of expenditure when applied 
to base year prices* This does not mean that Pl is always greater than Pp 
as actually computed — frequently it is not so. The numbers relate to differ- 
ent bases of reference. 

To compromise between the two points of view we may use as weights the 
arithmetic mean of and as in the Marshall-Edgeworth formula, or the 
geometric mean^ as in the Walsh formula (the names are those of writers who 
have prominently advocated these formulas). Dropping the superscript i, 
which is however to be understood, we may write these index numbers as 


(6*3) 


(5.4) 


Pmb 


X^(no + Vn)Pn/po 
]C(^0 + Vn) 


Ylpnigo + qn) 
Jlvoiqo + qn) 





Another compromise is to compute both Pl and Pp and take the mean, 
either arithmetic or geometric. The former was recommended by Bowley 
and the latter by Irving Fisher, who called it the ^^ideab^ index number. 

(6.5) Pb = {Pl + Pp)/2 

(5.6) P, = 


The formula that is perhaps the most frequently-used practical compromise 
is the fixed-weight aggregative. The weights arc the values at base-year prices 
of quantities which represent neither the actual base-year consumption 
(or production) nor the actual current-year consumption, but measure in 
some way the estimated relative importance of the items. Mitchell has 
advocated this formula with weights which are the average quantities of the 
commodities bought and sold over a period of years. 

(6.7) Pm = 

2- ("a) 

or in aggregative form 


Pu - I}(p»g.)/I](po9.) 
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As we shall see later, the Fisher index number lias some technical advantages, 
although it is harder to compute, and to interpret when computed, than most 
of the others. The Maishall-Edgeworth index number is regarded as a good, 
practical, all-round formula. The Mitchell index-numl)er suffers from the 
fact that the fixed weights tend to become out of date. 

With any index number it is useful to express the weights as fractions of 
the total so that they add up to unity. For example, if Vq' = Vo/^Vq, 

( 6 . 8 ) Pl = Jlvo’ipjpo), 

SO that the index number is a sum of numbers each corresponding to one item 
in the group of commodities which fonus the basis of the index. The contri- 
bution of each item toward the whole index number is therefore evident at 
a glance. 

6.3 An Example. The computation of a useful index number is likely to 
be laborious because of the large number of commodities that need to be in- 
cluded and the variety of prices (for different qualities, styles, etc.) within 
a single commodity. About 900 commodities and about 1800 price quota- 
tions are used in making up the United States Bureau of Labor Statistics 
Index of Wholesale Commodity Prices, and this index number is computed 
every week. Purely for the purposes of illustration, we take approximate 
figures for Canadian prices and per capita consumption for a few selected 
food items in 1939 and 1949. The data are presented in Table 17, where 1939 
is reckoned as the base year and 1949 the current year for which an index 
number is to be calculated. The calculatioii of the Laspeyres and Paasche 


Table 17. Approximate (!^anadian Prices (cents per pound) and Per Capita 
Consumption (pounds) of Selected Food Items, 1939 and 1949 


Food 

(1939) 

Vn 

(1949) 

(1039) 

Qn 

(1949) 

Flour 

3 3 

6 9 

185 

152 

Potatoes 

2 25 

3 31 

192 

191 

Veal 

17 4 

55.7 

10 5 

10.9 

Sugar 

6 4 

9 7 

94 7 

98.1 

Coffee 

34 5 

65 0 

3 7 

6.8 

Cheese 

(Cheddar) 

26 8 

61.4 

3.4 

3.3 

Breakfast food 

18 6 

30 1 

7 4 

5.9 


(Adapted from data in Canada Year Book, 1950, and Labour Gazette, 1962) 

Note. The published figures of prices and consumption are difficult to compare. They 
generally refer to different classifications of the items, or to different periods of time. The 
figures above are crude approximations. 
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Table 17a. CALCtJLATioN of Index fob Items of Table 17 


V 


Food 

Vnhh 

Vo 

Vo' = Vo/^VO 

VoVn/Vo 


Vj - Vn/'SiVn 

«»'Pn/Pt 

Flour 

2 09 

611 

0 279 


502 

0.230 

in 

Potatoes 

1.47 , 

432 , 


0.290 

430 

0.197 


Veal 

3.20 

183 


0.269 ; 

190 

0.087 

w3SM 

Sugar 

1 61 

606 

0.277 

0.418 

628 

0.288 


Coffee 

1.89 

128 

0.058 

0.110 

236 

0.108 

0.204 

Oheeee 

(Cheddar) 

2.29 

91 

0.042 

0,096 

88 

0.040 

0.092 

Breakfast food 

1 62 

138 

0.063 

0.102 

110 

0.050 

0.081 

Total 




1.868 

2183 

1.000 

1.861 


Pl » 1.87 or 187% 
Pp = 1.86 or 186% 


index numbers is shown in Table 17a, and it is evident that these are so close 
together that it is hardly worth while to work out the index numbers which 
will lie between them. 

6.4 Quantity Index Niunbers. Instead of comparing prices we may be 
interested in comparing the production (or consumption) from year to year 
of the commodities making up the group. That is, we want to average 
quantity relatives ^ and these may be weighted, like the price rela- 

tives, in various ways. We can, in fact, form seven quantity index numbers 
analogous to the seven price index numbers already mentioned, for example, 


(5.9) 

( 6 . 10 ) 
( 6 . 11 ) 


^ HwoiqnM Upogn 

Wl — 

^ H^niqn/qo) Hpnq^ 

— = — = 

Z.Wn Z^Pnqo 


Qp 



Xpogo- ILpnfyo. 


Here the weights are the values of base year consumption at base year 
prices, and thus are the same as the but the weights are the values 
of base year consumption at current year prices (that is, there- 

fore are not the same as the 

5.6 Fisher’s Tests for Index Numbers. Irving Fisher suggested that a 
good index number should satisfy two tests which he called the '^time-reversal 
test” and the "factor-reversal test.” The first means that the number for 
y^ar n relative to year 0 should be the reciprocal of the number for year 0 
relative to year n. This is obviously true for one conunodity, since Pn/jh is 
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the reciprocal of Pd/Pw; but it is not true for some index numbers, as is easily 
seen by interchanging 0 and n in the expressions (51) and (5.2). Thus, 
is not the reciprocal of ]CPngo/]CPo?o. The test is satisfied, 
however, by PuKy Pw, and . 

The second test means that a formula which is right for prices should also 
be right for quantities. That is, the price index multiplied by the corre- 
sponding quantity index should give the true ratio of value in year n to value 
in year 0. This is true for a single commodity, since pn/po times q%/qo gives 
Pnqn/jHfilQi but it is not true for most index numbers. Thus, 

Pngo ^ X^gnPo 
PoQo Y^QoPo 

and this is not equal to the ratio of values YlPngJ^Thgo- The test is, of 
course, satisfied by Fisiier’s index number. 

A good test for the consistency of a calculated price index number is the 
difference between Pl and Pp. If this difference is small, say less than 2 
points, either of them, or a mean of the two, may be accepted as a reasonable 
measure of price change, but if the difference is large not much confidence 
can be placed in any index number for the data used. 

5.6 Geometric and Harmonic Means of Relatives. It has been stated in 
§4.10 that the geometric mean is a suitable average for ratios, and since price 
and quantity relatives are ratios the geometric mean (suitably weighted) 
should give a good index number. It has, in fact, been extensively used in 
Britain. If we use the weights t?o (base year values), the geometric mean index 
num'ber is given by 

( 5 . 12 ) {Pom) = (Pn^^VPo^^^)'o<') • (pn^*VPO^^^)’o^*> * * * (Pn^'^VpO^'"' ) 

= II(Pn/po)’o 

or, writing 

( 5 . 13 ) log Pom = ^Vq' log (pn/po) 



This is a reliable index, and one which is comparatively little affected by 
sampling error (the error caused by the selection of the particular items to 
make up the index.) It does not satisfy m general the two Fisher tests 
described in §5.5. 

Perger (see Ref. 4 of Chap. IV) has argued for the harmonic mean, 

P _ Zt'o S(Wo) 

22(‘’oPo/Pn) ]C(PoWp») 


on the ground that the harmonic mean is lower than either of the others and 
so takes account of the economies that may be practiced by suitably varying 
the relative quantities of the different commodities purchased. These econ*^ 
omies will lower the effective price level. 
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6.7 Series of Index Numbers. A long aeries of index numbers may be 
calculated either by using the same base period all the time (the fixed^-base 
method) or by linking together sets of index numbers for succeasive periods 
(the chain method). If we denote the base period by 0 and the successive 
following periods by 1, 2, 3,* * •, n* • • the series of fixed-base index numbers 
may be written 

Poi) P 02j PqZ}* * *> pQn* 

each calculated by direct application of one of the formulas given above. 
The successive chain index numbers may be written 

Poh Poi*Pnt Poi*Pn‘p22 

For the fixed weight aggregative foimula the two series are equivalent. Thus 

Poi = £m<i/£po9. 

rt T> 

/ 01 * 02 

, ZwMa l^Piga 

But for the better formulas this is not true. With Laspeyres' index number, 

Jp02 = ^Pogo 

n n 
X 01 ‘ r^i2 = 

luPigi 

and PorPi 2 is not the same as P 02 . l^he fixed-base system is the easier to 
calculate, but the chain system is more reliable, for it uses weights which are 
calculated afresh for each link in the chain. 

Sometimes it is convenient to change one commodity for another in the 
index. In the textile world new fibers (e.g., nylon) are continually coming 
into use, and old ones (e.g., real silk) practically dropping out of use. It is 
possible to change the one for the other, provided we have data for an over- 
lapping period, without disrupting the index. Thus, if go^^^ are base 
year quantities for commodities A"i and X 2 , the value of being hypothetical 
because X 2 did not exist in the base year, and if Pn^^\ pn^’^^ are current year 
prices for the two commodities, we may put 

This makes pn^^g^^^^ =*= so that for the overlapping period it makes 

no difference whether we use Xi or X 2 in the index. 

5.8 Adjusted Death Rates. Death rates are expressed as number of 
deaths per 1000 of population, and naturally these rates differ in different 
age groups, being largest among the very young and the very old. In com* 
paring the death rates of two communities it is not accurate to use simply 
the crude death rates regardless of age, because the two communities naay 
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hfiive ft different age composition. (One may have a larger proportion of 
young children than the other, for instance.) A method of allowing for this 
is to form for each conimunity a weighted average of the specific death rates 
in eacli age group, using the same weiglits for both communities, namely, 
the numbers in each age group in a certain fixed standard population. The 
weighted average of specific death rates is thus a kind of index-number for 
the community and may fairly be used in a comparison. It is called a 
standardized or adjusted death rate. In Statistical Abstracts of the United 
StateS) for the last few years, specific death rates ani given for various age 
groups, as well as an adjusted death rate. For tiio adjustment a standard 
population based on the actual United States population of 1940 is used. 

A simple example of this conijiarison oi death rates is given by Pearl 
(Reference 3). The crude death rates m 1910 of two cities of about the same 
population (Providence, R. L, and Seattle, Wash.) were 17.76 and 10.26, 
with a ratio of 1.73. Tlie specific rates for different fige groups of the popula- 
tion are given in Table 18, together with a “standard million'^ based on the 

Table 18. Death Rates for Providence, R L, \nd Seattle, Wash., 1910 


Providence (Population 224,050) 


Age Group 

Percentage of 
Total Population 

Specific Death 
Rate (p) 

Standard 
Million (q) 

P9/10* 

0-5 

9 74 

53 SO 

115,806 

6.24 

5-10 

8 35 

3 Uh 

106,321 

0.42 

10-20 

. 17 10 

3 70 

197,931 

0 74 

20-40 

37 30 

7 13 

333,379 

2.38 

40-60 

20 75 

IS 37 

178,845 

3.28 

60-80 

6 30 

67 61 

62,391 

4 22 

over 80 

0 47 

172 02 

5,327 

0.92 


100 01 


1,000,000 

18.20 


Seattle (Population 234,719) 


0- 6 

7 26 

j 

26 58 1 

115,806 

3 08 

6-10 

6 44 

3 31 j 

106,321 

0.35 

10-20 

13 92 

3 28 ! 

197,931 

0.65 

20-40 

46 58 

5 70 i 

333,379 

1.90 

40-60 

21 22 i 

12 55 

178,845 

2.24 

60-80 

4 32 

44 08 

62,391 

2.75 

over 80 

0.25 

174.68 

5,327 

0.93 


99.99 


1,000,000 

11.90 


(0-5 means up to but not including 5 years) 
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actual compoeition of the United States population in 1910, which gives the 
nunGtbers, out of a total of one million, in each of the age groups cited. The 
adjusted death rates are then calculated as weighted averages of specific 
death rates and the results are 18,20 for Providence and 11.90 for Seattle, 
with a ratio of 1.53. The apparently high ratio of death rates for Providence 
and Seattle was therefore due, at least in part, to differences in age composi- 
tion of the two populations, and in fact a breakdown of the population figures 
shows that Seattle had in 1910 a considerably higher proportion than Provi- 
dence of young adults (a group with low specific death rate) and a consider- 
ably lower proportion of children under 6 and adults over 60 (groups with 
hi^ specific death rates). 


Exercises 

1 . Compute from Table 19 four index numbers (Laspeyres, Paasche, Marshall-Edge- 
worth, and Fisher) of farm crop prices in the United States for 1935, with 1919 as base year. 
Am, Pl - 38.1, Pp = 37.5, Pp = 37.8, Pme * 37.8 (1919 price « 100). 

Table 19. Average Prices and Total Production for Twelve Leading 
Farm Crops in United States, 1919 and 1935 


Crop 

Unit 

Price (dollars) 

Production (rnilliom) 

1919 

1 

1986 

1919 

1985 

Com 

bus. 

1.343 

0.547 

2679 

2203 

Wheat 

bus. 

2.131 

0.894 

952 1 

603.2 

Cotton 

lb. 

0 356 

0.115 

5705 

5365 

Hay 

ton 

20.15 

7.23 

76.59 

75.62 

Oats 

bus. 

0.702 

0 257 

1107 

1195 

Tobacco 

lb. 

0.390 

0.185 

1444 

1284 

Potatoes 

bus. 

1,580 

0.634 

297.3 

356.4 

Sugar 

lb. 

0.102 

0,031 

4371 

*6278 

Barley 

bus. 

1.215 

0.377 

131.7 

292.2 

Rye 

bus. 

1.331 

0 402 

78 7 

57.9 

Rice 

bus. 

2.666 

0.624 

42 69 

38.46 

Flaxseed 

i 

bus. 

4.383 

1.548 

6.77 

14.93 


Source: Yearbook of Agriculture, and '^Crops and Markets” (U. S. Dept, of Agriculture). 
Quoted by Mills, Statistical Methods, 1938 (Holt), pp. 179-206. 

* 1934 production figure used. 

2. Draw up a table of logarithms of price relatives for the data of Table 19, and compute 
a geometric mean index of farm prices, using weights based on 1919 production. 

Am, Pqm *“ 37.8. 

8. Table 20 gives the average prices of selected t 3 rpes of meat at three periods, April 
1940, January 1947, and January 1948, in retail stores in Edmonton, Alberta, and also 
average quantities of each purchased per family by housewives in the month of April 1940. 
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Constnict a rAtail meat price index for 1947 and 1948 (Laapeyres’ formula), witib 1940 as 
base year. Ana, Pl - 166.6 (1947), 196.9 (1948). 

Table 20. Prices of Selected Cuts of Meat, Edmonton, 1940, 1947, 1948 
AND Consumption, 1940 


Meat 

p»(1940) 

cents/lb 

qo U> 

P.(l»47) 

P.(l»48) 

Beef, prime rib roast 

18.48 

10 

30.46 

35.68 

Beef, rolled rib roast 


6 

40.27 

44.50 

Beef, shoulder roast 

13.71 

4 

24.18 

30.07 

Beef, fiank stew 

9 72 

4 

19.19 

24.04 

Beef, sirloin steak 

22.30 

2 

43.76 

47.85 

Veal, shoulder roast 

16 46 

4 

23.15 ! 

27.66 

Pork, shoulder roast 

17.38 

2 

27.33 

36.61 

Pork, loin 

24.83 

0.25 

40.00 

46.62 

Bacon, sliced side 

29.90 

1 75 

50.74 

65.71 

Ham, boiled sliced 

54.31 

0.75 

69.13 

78.69 


4. Table 21 gives approximate figures for the specific death rates by race for Mississippi, 
Florida, and the Unit^ States, 1948. The crude death rates for these two states were 9.9 
and 9.5, per 1000. The United States population as a whole may be taken as 89.8% white, 
and Mississippi and Florida as about ^.7% and 72.8% white. Calculate adjusted death 
rates for Mississippi and Florida, corrected for the different race composition of the two 
states. (Use the actual United States population as standard.) Ans. 9.3 and 9.5. 


Table 21. Specific Death Rates (per 1000) by Race, 1948 



White 

United States 

9.7 

Mississippi 

9.1 

Florida 

9.4 


Non-White 

11.7 

10.7 
10.3 


Source: Statistical Abstracts of U. <S., 1951. 


6. Adjust the death rates for Seattle and Providence (Table 18) by the following “stand- 
ard million,’’ which is based on the population of England and Wales in 1901 and which has 
been widely used. 

England and Wales Standard Million 


Age Group 

Population 

Age Group 

Population 

0-4 

.. 114,262 

40-44 

56,893 

5-9 

.. 107,209 

45-49 

. . 48,365 

10-14 

.. 102,736 

50-54 

40,857 

15-19 

. . 99,796 

55-59 

. . 32,359 

20-24 

. . 96,946 

60-64 

. . 27,382 

26-29 

.. 86,833 

65-69 

. . 19,368 

30-34 

. . 74,746 

70-74 

. . 13,722 

35-39 

. . 65,966 

76-79 

8,131 



80- 

5,450 




1,000,000 
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6 . {Woods and Russell) Specific death rates for clergymen and for railwayxnen (1900- 
1902) are given in the following table: 


Age group 
46-54 
55-64 
65- 


Clergymen 
9.82 
23 38 
82 57 


Railwaymen 
13.34 
29.76 
93 17 


The crude death rates were 36 34 for clerg 3 na[ien and 26.52 for railwaymen, with a ratio of 
1.33. Show that when the rates are adjusted to the England and Wales standard million 
(Exercise 5), they become 31.31 and 37.39, with a ratio of 0.837. (The great difference 
between adjusted and unadjusted rates is due to the fact that the proportion of clergyrnwi 
in the highest age group, with a high specific death rate, is much greater than the propor- 
tion of railwaymen m the same group. The crude death rate for clergymen is therefore 
relatively too high. ) 
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CHAPTER VI 

STANDARD DEVIATION AND OTHER MEASURES OF DISPERSION 

6.1 Various Measures of Dispersion. As already mentioned, the dis- 
persion of a distribution is indicated by the extent to which observed values 
of the variate tend to spread over an interval rather than to cluster closely 
around a central average. One measure of dispersion is the quartile devia- 
tion, described in §3.4, but there are also in common use the range^ the mean 
absolute deviation y and the standard deviation. Of these, the standard devia- 
tion is by far the most important. 

6.2 The Range. The range is the most simple and obvious measure 
of dispersion. It is the length of an interval which just covers the highest 
and the lowest observed values in a set, and thus measures the spread in the 
most direct way possible. Among the grades of Table 2, §1.8, the highest 
is 98 and the lowest 34, so that the range is 98 — 34, or 64. 

If the observed values are measurements which are recorded with a finite 
“step'" (for example, lengths recorded to the nearest Imlf inch), ^he range as 
calculated by subtracting the lowest value from the highest should, strictly 
speaking, be increased by tliis step. Thus, if the numbers in Table 2 repre- 
sented the results of 100 measurements of lengths, in inches to the nearest 
half inch, the number 98 might mean a length as great as 98.25 in. and the 
number 34 a length as small as 33.75 in. The range would therefore be 
64.5 in. 

In grouped frequency distributions the range is usually taken as the differ- 
ence between the highest and lowest class marks, but if the total frequency is 
sufficiently great for even the end classes to contain several members, it is 
probably better to subtract the lower boundary of the first class from the 
upper boundary of the last class. The range, however, is not a very precise 
measure, being greatly affected by the presence of a few exceptionally large or 
exceptionally small values among those observed. Thus in a sample of 738 
Welshmen the range of weight was 190 lb, but tliis range was reduced to 120 
lb merely by omitting the 5 heaviest men. (See §0.4, Reference 22, p. 111.) 

The outstanding merit of the range is its simplicity, and for this reason it 
is used a great deal in industrial quality control. In many modern factories 
a running check is kept on the quality of the output by taking regular samples, 
and noting both the mean (as a measure of the average level) and the range 
(as a measure of variability). The variable measured noay be, for example, 
the breaking strength of a sample of cloth or the width of a slot in a machine 
part. For practical reasons the sample must be small (five is a common size), 

75 
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and the computations must be within the capabilities of a works foreman, so 
that he can take action at once if action seenas to be called for. In routine 
control all he needs to do is to add five readings and subtract the lowest from 
the highest, plotting the results on prepared charts, and as long as the plotted 
points stay well inside the limits marked on the charts, the factory process 
may be assumed to be satisfactorily ‘^in control/’ 

The defects of the range are that it depends only on two values out of the 
whole set, that it is comparatively sensitive to fluctuations of sampling 
(different samples of the same size from the same population may have widely 
different ranges), and that it is difficult to work with mathematically. But 
in spite of this difficulty, fairly good tables now exist from which it can readily 
be determined whether an observed range in a sample of given size (at least 
from certain types of parent population) is, or is not, unusual. See §13.15. 

6.3 Deviations. The deviation of a value x, from a fixed arbitrary value 
Xo was defined in §4.5 as the difference x, — Xo. In practice Xo is alpost 
always taken as the arithmetic mean x, and if so we have the important result; 

Theorem 1. The arithmetic mean of the deviations is zero. 

Proof: Let x/ = x, — x, and let the frequency corresponding to x* be /<. 
Then 

1-1 

~ s/l^» 

But by Eq. (4,2) X^/tX, = Nx and ~ whence ^ftX/ = 0. Dividing 
by Ny we have the result stated. 

The greater the spread of a distribution, the greater numerically will be 
the largest deviations, and it seems reasonable to take, as a measure of the 
variation about the mean, a suitable average of all the deviations. The 
arithmetic mean of the deviations, as we have just seen, is zero, but we can 
take the arithmetic mean of the absolute deviations (disregarding signs), or 
the root mean square average (in which all the deviations are squared). The 
former procedure gives the mean absolute deviation and the latter the standard 
deviation. 

6.4 Mean Absolute Deviation. The absolute value of any real number y, 
denoted by | 2 /|, is the numerical value of y, with positive sign. That is, 
\y\ - y if y >0, and \y\ - —y if y < 0. The mean absoliUe deviation 
(MA.D.), often inaccurately called the mean deviation^ is defined by 


( 6 . 1 ) 
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mean absolute deviation for the grades in Table 14, §4.5, where the 

mean value of x is 72.5. 


X 

/ 

|x - x\ 

S\x - i| 

34,5 

2 

38 

76 

44.5 

3 

28 

84 

54.5 

11 

18 

198 

64.5 

20 

8 

160 

74.5 

32 

2 

64 

84.5 

25 

12 

300 

94.5 

7 

22 

154 

Total 

100 


1036) 


M.A.D. 


1036 

100 


10.36 


There is some theoretical reason for defining; the M.A.D. in terms of devia- 
tions from tlie median. It can l>e proved that |r, ~~ Xo | i*? ‘i minimum, 
for any choice of xo, when Xo is the median. However, the use of the mean 
is conventional. 

For a grouped distribution a coded method of calculating the M.A.D. can 
be used, but the formula is not very convenient to use. The mean absolute 
deviation is generally unwieldy in mathematical discussions, and its chief 
use in statistics is in situations where occasional large and erratic deviations 
are likely to occur. The standard deviation, which uses the squares of these 
large deviations, tends to overemphasize them. 

6.6 The Standard Deviation. The common measure of disf)ersion, to 
be pi ef erred in most circumstances, is the root-u'ean-scpiarc average of the 
deviations from the mean. The name standard deviation for this quantity 
was proposed by Karl Pearson. We shall denote it by 


( 6 . 2 ) 


8x 





N = E/.- 

t 


The square root has the effect of making the unit of s* the same as the unit 
of X,-. If x* is in feet, (x, — x)- is in square feet, and Sx again in feet. The 
quantity 8^ is called the variance , and in many ways it is more fundamental 
than the standard deviation. 

The traditional symbol for the standard deviation is the Greek letter a 
(sigma). Many writers even use such phrases as *^a deviation of three 
sigmas'' meaning a deviation three times as great as the standard deviation. 
In recent years, however, the practice has been to distinguish between a 
sample and the population from which it is drawn, by using Latin letters for 
quantities characterizing the sample and Greek letters for corresponding 
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quantities characterizing the population. With this convention (which we 
shall find very useful in the mathematical discussions of Part Two) the vari* 
ance of a population is denoted by and that of a sample from the popula- 
tion by One of the main problems of statistics is that of estimating 
characteristics of a population from the characteristics of a finite, and possibly 
small, sample, but we can rarely calculate directly. We are therefore 
usually concerned in an actual calculation with 

It is proved in Part Two (§7.9) that if is the variance of a sample of 
iV, the '^best^^ estimate of the population variance (best in a certain sense 
which is explained there*) is not but Ns^^KN — 1), If, however, we 
define Sx by 

(6-3) s* *= 

it will be true that, in the same sense as before, .9** itself is the best estimate of 
(Tx®. For this reason several authorities (among others, II. A. Fisher and 
S, S, Wilks) prefer to use the definition (6.3) instead of (6.2). Since the justi- 
fication for using (6.3) belongs to more advanced mathematical statistics, 
and since its use prevents the definition of s* as an average of deviations, we 
shall continue to use the form (6.2). Of course, for large values of N the 
two forms are practically indistinguishable. 

6.6 Calculation of the Standard Deviation. Equation (6.2), although 
the definition of s*, is not well suited for actual computation. The quantities 
— 5 are usually awkward numbers to square, and if they are not carried 
out to several figures an appreciable error may be introduced into the result. 
From (6.2), we have 

Ns* « - J)» 

= — 2x,x + x^) 

= - 2xXl/<x. + 

But S/.X,' = Nl and 22/. == N 80 that we obtain, on substituting for these 
sums, 

Ns,* = - 2NS* + NIP 

- - NP 

(6.4) - i:/<x.-» - 

This is the usual form for the calculation of s* for a discrete variate, or from 

a set of ungrouped values of x. If N is fairly small, all the Xi may be treated 
individually, in which case all the /» will be equal to 1 and we shall have 

N 1 / ^ \* 

(6.5) ~ V^i^V 


♦ See also §7.9 of this book. 
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The ftrithr(3.etic can often be simplified by subtracting a suitable fixed value 
Xo from each Xj. If m,- = x, — xo, and consequently, by (4.4), ffi = 2 — Xo*, 

— Xo) — (x — xo)p 

= 2Z/.(w.- - uy 

(6.6) = Z/.-w." - (ZM.)^ 

The calculation can therefore lie carried out entirely in terms of the variate w, 
and no adjustment is needed in the result. 

Exam-pie 1. Find the variance of the integers 1 to 10 inclusive. Here we use (6.5). By 
Theorems 4 and 5 of §4.2, 

10 

Lx, = 10(10 -f l)/2 = 55 
1 

10 

and = 10(10 -f 1)(20 -f l)/6 -= 385 

1 

so that 10s,* « 385 ~ (55)V10 

« 82.5 

and Sx* « 8.25 

Table 22. Average Yields of Corn in Bushels per Acre 
FOR A Certain Section in Iu^nois from 1901-1920 


Year 

Yield (j) 

'll 

u* 

1901 

21 

-15 

225 

1902 

39 

3 

9 

1903 

32 

- 4 

16 

1904 

37 

1 

1 

1905 

40 

4 

16 

1906 

36 

0 

0 

1907 

36 

0 

0 

1908 

32 

-- 4 

16 

1909 

36 

0 

0 

1910 

39 

3 

9 

1911 

33 

-- 3 

9 

1912 

40 

4 

16 

1913 

27 

- 9 

81 

1914 

29 

- 7 

49 

1915 

36 

0 

0 

1916 

30 

- 6 

36 

1917 

38 

2 

4 

1918 

36 

0 

0 

1919 

36 

0 

0 

1920 

35 

- 1 

1 

Totals 

» 20 

-32 

488 
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Example 2. Find the mean yield of com and the standard deviation of yields, for the 
years 1901-1920, from the data of Table 22. 

Here the items are ungrouped and can be treated individually. (It is hardly worth while 
to rewrite the table merely for the sake of grouping together the six values of x each equal 
to 36.) Subtracting 36 from each x, we obtain the values of u shown. 

*= — 32 and “= 488 

so that 20«,* « 488 - (32)V20 « 436.8 

giving s, » 4.67. 

Since u « —32/20 «= —1.6, we have x = 36 — 1.6 « 34.4. The mean and standard 
deviation of the annual yields are {herefore 34.4 bushels and 4.67 bushels, per acre. 

Example 3. Calculate the standard deviation of the number of sixes m a throw of 12 dice 
(Table 10, §2.3). 

There is no need to change tlie variable since the Xt are small numbers and we can use 
(6.4). The calculations are shown in Table 23. Column 4 is easily obtained by multiply- 
ing together the columns for x and fx. 

Table 23. Number of Sixes in a Thiwiw of 12 Dice 


X 

/ 

fx 

fx^ 


0 

447 

0 

0 


1 

1145 

1145 

1145 


2 

1181 

2362 

4724 

* = 8191/4096 = 2.000 

3 

796 

2388 

7164 

, 23259 /8191 V 

4 

380 

1520 

6080 

4096 V4096/ 

5 

116 

575 

2875 

« 1.679 

6 

24 

144 

864 

7 

7 

49 

343 

s, - 1.296 

8 

1 

8 

64 


Total 

4096 

8191 

23,259 



6.7 Standard Deviation of a Grouped Continuous Variate. The method 
of §4.5 for calculating the arithmetic mean of a continuous variate grouped 
in classes may be extended to the calculation of the variance. If c is the class 
interval and Xo the class mark for one of the central classes of the distribution, 
the coded variate u is given by x* = cui + xo, and the u, are simple consecu- 
tive integers when the classes are all equal in width. 

By (4.4), X = ciZ + iCo, so that 

X* — X = c(ui — U) 

— c)* 

= C*A8.» 


Hence, 
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where sj is the variance of the u variate, as defined in (6.4). We have 
therefore, 

(6.7) s.* = 

so that we merely have tu multiply the calculated by c to get the value of s,. 

Example 4. Oalculate the standard deviation for the grades of Table 14, §4.5. 

Here it is merely necessary to extend the table so as to include anotiier column for/w®, 
found by multiplying together u and /a. The computations aie shown in lable 24. The 
last column is a check column (§6.8). 


Table 24. Grades of 100 Students ('I'able 2) 


Class Mark 

X 

Frequency 

f 

i 

u 


fu‘ 

f{u -f- 1)® 

34 5 

2 

-4 

-8 

32 

18 

44 5 

3 

-3 

-9 

27 

12 

54.5 

11 

-2 

-22 

44 

11 

64.5 

20 

-1 

-20 

20 

0 

74.5 

32 

0 

0 

! 0 

32 

84 6 

25 

1 

25 

25 

100 

94 5 

7 

2 

14 

: 28 

63 

Total 

100 


-20 

1 

j 176 

23t) 


u ~ — 0.2 

lOOsJ - 176 - 400/100 = 172 
Su = (1.72)*/* - 1.31 
Sg ~ cSu “ 33.1 

8.8 Charlier Check. In all but the simplest statistical calculations, and 
particularly in the more long and complicated ones, it is desirable to have 
systematic checks on the arithmetic. One such check, due to L. V. Charlier, 
may conveniently be introduced into the work-slieet for the standard devia- 
tion. The check involves forming a column of values of f{u + 1)^, and 
depends on the identity: 

(6.8) 2:/(u + 1)» = £/(«= + 2u + 1) 

= + 2Zfu + Z/ 

To form the product /(w + 1)* we multiply each value of / by the value 
of u in the next line below, and multiply the result again by the same u. 
In Table 24, ^f{u + 1)^ = 236, and the right-hand side of (6.8) gives 
176 + 2 (—20) + 100, which is also 236, so that the arithmetic is checked. 

6.9 Grouping Error of the Standard Deviation. When dealing with a 
grouped variate we calculate the mean and standard deviation as though all 
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the values within a class were equal to the class mark. In the calculation 
of the mean for a fairly large sample this assumption usually introduces no 
appreciable error, because the errors introduced in the region of values below 
the mean tend to be compensated by opposite errors in the region above the 
mean. Wlien the values are squared, however, the compensation of errors is 
far from exact, particularly with a coarse grouping. give a simple and 
artificial illustration, suppose the population of the following table is grouped 
in two classes, thus : 


z 

1 

2 

A 

4 

5 

6 

.r 

2 

5 

f 

10 

20 

30 

30 

20 

10 ' 


60 

60 


Grouped 


It is easily verified that is 420 in both cases, but is increased from 
1700 to 1740, so that is increased from 1.92 to 2.25. 

Unless we have the origirial data and work out the standard deviation 
exactly, w^e cannot tell how large this gi'ouping error really is. However, for 
samples of a few hundred which are of the hump-backed type, tailing off gradu- 
ally at both ends, it has been sho\Mi that, on the whokj an improved estimate of 
the true variance is given by subtracting c^/l2 from the variance as calculated 
from the grouped distribution. This correction is known as Sheppard's cor- 
rection, It is most easily applied by subtracting 1/12 (= 0.0833) from the 
calculated value of sj. 

As applied to Table 24, this correction would reduce sj fiom 1.72 to 1.6367, 
giving Su = 1.28 and s* = 12,8. The true value can be calculated from the 
data in Table 2, by using formula (6.5), and is 12.9, so that here the correc- 
tion does improve the result. However, the total frequency is rather small 
in this example for the application of Sheppard's correction. With small 
samples the error due to the fluctuation from one random sample to another 
is much largei than the correction, so that the latter is an unnecessary 
refinement. 

6.10 Meaning of the Standard Deviation. Because of the comparatively 
elaborate calculation required to determine it, the standaid deviation is not 
as easy to visualize as the range, or even as the quartile deviation. Con- 
sidered as an average deviation, it is not as simple an average as the mean 
absolute deviation. Yet it is so important a concept that every effort should 
be made to get a clear idea of its meaning and of its approximate size in rela- 
tion to the other measures of dispersion. 

Many actually occurring distributions have frequency curves of the uni- 
modal type, with more or less flat ^^tails" at both ends, and are roughly sym- 
metrical. Such frequency curves can be regarded as approximations to a 
well-defined mathematical curve known as the ‘^normal curve,'' which will 
be discussed in a later chapter. For the normal curve the range is infinite, 
but 99.7% of the whole area under the curve is contained within an interval 
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from 3<r below the mean to So- above the mean {cr being the standard devia- 
tion). For a sample of two or three hundred individuals, therefore, the 
effective range is about six times the standard deviation, and tliis fact is a 
useful rough guide to the expected magnitude of the standard deviation. For 
very large samples tiie range is greater than this, ai d for small samples less, 
relative to the standard deviation. 

Another property of the iiorinul curve is that the area between ordinates at 
distances o- above and below the mean is 0.683 times the total area. Roughly, 
for any distribution not too different 
from the ‘^normal” type, about two- 
thirds of the whole distribution is con- 
tained in an interval from x — 5,: to 
X + 8x- This is perhaps the clearest 
picture one can get of the standard de- 
viation, although it does not hold for 
very skew distributions. It is illus- 
trated in Fig. 21 for the distribution of 
Table 24. It may be noted that for 
this distribution the inteival is from 
59.7 to 85.3. The number of grades in 

Table 2 (§1.8) actually within these ^ 

limits is 68, which is just about what we ^ x 

should expect. Since one-half, exactly, Fio 21 Standard Deviation 

of the distribution lies between the 

quartiles, the standard deviation is larger than the quartile deviation. The 
ratio 8x/Q is usually close to 1^. 

For a normal distribution tlie ratio of the standard deviation to the mean 
absolute deviation is 1.253, or almost IJ. The mean absolute deviation for 
the gi’ades in Table 24 is 10.36, so that we should expect a standard deviation 
of about 13, as we actually find. 

We have already mentioned in §4.8 that when a distribution is represented 
by a histogram, the centioid (or center of gravity) lies on the ordinate through 
the arithmetic mean. If we think of the histogram as cut out of a thin 
uniform metal plate and rotated about an axis in its plane through the cen- 
troid, it will have a certain ‘‘moment of inertia.” The distance from the axis 
at which the entire mass could be considered as concentrated without cnanging 
the moment of inertia is called the radius of gyration and is equal to the stand- 
ard deviation of the distribution. This illustration may help the mechanically 
minded student. 

6.11 Relative Dispersion. The size of observed values usually influences 
not only the mean but also deviations from the mean. In other words, the 
magnitudes of the deviations from the mean seem to be dependent, in some 
degree, upon the magnitude of the mean. In comparing dispersion in die- 
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tributions, we may correct for differences in the average magnitudes of positive 
values by taking the ratio of the standard deviation to the mean. Thus, 
the quantity 

(6.9) F = I 

is known as the coefficient of variation. It is obviously an abstract number, 
being independent of the units of measurement, and it is usually expressed 
as a percentage. 

The use of (6.9) may be misleading in situations where the origin from 
which the data are measured is somewhat arbitrary. Cases in point are 
temperature measurements and certain psychological data. 

6.12 Some Theorems on Variance. One of the most important properties 
of the standard deviation (oi of the variance, which is its square) is the com- 
parative ease with which it can Ix) manipulated in the theoretical discussions 
of mathematical statistics. Some illustrations of this property are given in 
the present section. 

Theorem 1, The sum of squares of deviations of the variate values from their 
mean is less than the sum of sfjuares of deviations from any other number. That 
is, for an arbitrary xo, not equal to z, 

(6.10) iVs/ K'Eix,- x,y 

(For convenience the x, are treatfid individually. There may, of course, be 
ft values all equal to x».) 

Proof: Zi — Xo = Xt — X + (2 — Xo) 

Therefore 

X(x.' - Xo)^ = + 2X^(x» - x)(2 - Xo) 

= L(^. - + N{x - Xo)* + 2(x - Xo)E(a:. “ *) 

But 2(x, — x) = Nx — Nx = 0, and X)(x< — x)* = iVs,*. Hence 

(6.11) E(*. - ^y = + N{x - Xo)* 

Since the last term on the right-hand side is necessarily positive, whether 
Xo is greater or less than x, the theorem follows. 

Theorem 2. Let there be one set of ni variates zn (i == 1, 2,* * •, ni) and 
another set of n^ variates Z 2 % (i = 1,2, • • • , n 2 ) and let S be the mean of the com- 
bined sets (Theorem 10, §4.6). The variance s^ of the set formed by the cemr 
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Unation of these two sets is given by the following formula: 

( 6 . 12 ) ^ f^(xu - ly + - S)* 

1 1 


where 


N ^ ni + Th 


Proof : The proof consists* in showing that 

«i \ nj ni+na 

L,{xu - xy + z^ixii - By = ^(xi - By 
1 1 1 

which is left as an exercise for the student. 

The foregoing theorem is not very important in itself, but it is useful in 
proving the next theorem which gives the relation between the variance of 
a composite set and the variances of subsets. 


Theorem 3. Let the frequency ^ meauj and standard deviation be denoted by 
Uijlij and Si jor one set of variates and by dnd S 2 for a second set. The 
variance of the composite set is given by the following relation: 

Ns^ ~ niSi^ + ^252^ + nidi^ + 

where iV = ni + n 2 , di == — x, ^2 = ^2 — x, and x is the mean of the con^ 

posite set. 

Proof : For the ni set, x may be regarded as an arbitrary point Xq. Hence 
by (6.11) we have, after rearranging terms, 

-- xiY = — - xy - (xi - x)* 

ni 1 ni 1 

Multiplying through by ni this becomes 

(6.13) niSi^ = l|)(xu- - xy - nidi® 

1 

I 

Similarly for the n 2 group we have 

(6.14) n2S2® = 2^(x2t — xy — n2d2* 

1 

Adding (6.13) and (6.14), and using (6.12), we obtain 


niSi® + n2S2* == iVs® — nidi® — n2d2® 

Hence 

(6.16) Ns^ « niSi® + ThSi^ + nidi® + thdz^ 
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For k Bets combined into a single set we can generalize (6.16) into the 
following relation : 

k k 

(6.16) ATs* = 

1 1 

k 

where N «= and f — 2. It is interesting to observe that 
1 i ^ 

— is the variance of the means of the subsets. Thus we have the 

N i 

important relation 

(6.17) 


iV 1 


which shows that the total variance may be broken up into two parts, one of 
which is the weighted mean of the variances in the subsets and the other is 
the variance of their means. These two parts are sometimes called the 
average vaiiance within classes and the variance between the means of the 
classes. They become very important in the ^^Analysis of Variance.'^ (See 
§12.16 and also Part Two, Chapter IX.) 

Corollary. Equation {6.15) may also be written 


(6.18) 
Proof : 


UiTU 

Ns^ = niSi* + 71252* + {2i — 


flldl* + 712^2* ~ ^l(^l 5)* + 712(^2 x)* 

= ni2i^ + 712^2* — 25(711^1 + 7 ^ 2 X 2 ) + (ni + 712 ):? 

By Theorem 10, §4.6, ni2i + 712^2 * N2y and also tii + 712 == N. Therefore 
(6.19) nidi^ + 712^2* ~ + 712^2* “ 

=» + 712^2* — (tIi^I + n^iY/N 


ni^ 


UxUi 


Tile coefficient of 2^ is tii — 

ni + 7h 711 + 712 

The coefficient of ^ 1^2 is — 2nxn^lN. Hence 


Tiidi* + n^^ 


niUz 

If 

7ll7l2 

If 


(f 1* + ^2* • 
(^1 - 2^2)* 


7li7l2 

If 


and similarly for ^ 2 *. 


22i2i) 


The equation (6.18) then follows from (6.15). This form cannot be general* 
ised to k sets. 
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Ezetctses 

1. What la the range in (o) the weights of Table 12, §2.6; (b) the scores in Exereise 9 
below? 

8. (E. S, Pearson). The following data represent the percentage of ash-con lent in 

280 wagon tests of a certain kind of coal. Find the mean and the standard deviation of the 
distribution; 


Percentage 


Ash-Content 

Frequency 

3.0- 3.9 

1 

4.0- 4.9 

7 

5.0- 5.9 

28 

6.0- 6.9 

78 

7.0- 7.9 

84 

8.0- 8.9 

45 

9.0- 9.9 

28 

10.0-10.9 

7 

11.0-11.9 

2 

Am. i « 7.35%, « 1.36%. 


8 . (Camp). Find the mean wage and the standard deviation of wages for the following 

distribution: 


Class 

Frequency 

84.50- 5.99 

43 

6.00- 7.49 

99 

7.50- 8.99 

162 

9.00-10 49 

178 

10.50-11.99 

160 

12.00-13 49 

40 

13.60-14.99 

26 

15.00-16 49 

3 

Am. iV - 700, X » 89.42, s, «= 82.19. 


4 « Compute the mean absolute deviation for the data of Exercise 2. Find the ratio of 

the M.A.D. to the standard deviation. 


6, Find the mean and standard deviation for the data of Table 11, §2.3. Compare the 

range with the standard deviation. Arts, x « 

6.139, s. = 1.712, range - 10. 

6, Find the mean and standard deviation for the distribution of lengths of telephone 

calls given in Table 25. Use the Charlier check and Sheppard’s correction. 

Table 26, Distribution of Lengths of 

996 Telephone Calls. 

Time in Seconds 

Time 

Number of Calls 

0-99 

1 

100-199 

28 

200-299 

88 

300-399 

180 

400-499 

247 

600-699 

260 

600-699 

133 

700-799 

42 

800-899 

11 

900-999 

6 
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7. Calculate the quartile deviation for the data of Table 26, and find the ratio to the 
standard deviation. Ans, 0.69. 

8 . Find the percentage of values in the distribution of Table 26 outside the limits 
X 8„ X 2sx and x =*= 3sx, respectively. Ans. 32.7, 5.0, 0.4. 

Hint. Assume that the values within any class are equally spaced throughout the class 
interval. It will be helpful to construct a histogram and mark off the limits mentioned 
with ordinates. The calculation is similar to that for percentile ranks. 

9. Find the mean and standard deviation for the following set of 26 scores: 82, 86, 76, 
78, 72, 79, 63, 66, 67, 75, 68, 70, 79, 78, 51, 68, 65, 69, 68, 83, 80, 42, 43, 48, 47. Ans. x • 
67.6, 8, « 12.7. 

10. The following data were obtained in a mental test for 290 prospective employees: 
mean = 43.33 pts., s, = 9.26 pts. The percentage of standard production attained by 
these same persons after being employed varied about a mean of 92.02% with a standard 
deviation of 24.47%. Compare the relative dispersions in mental test and production 
ability. 

11. Find $, Sx, and the M.A.D. for the following distribution: 


X 

2 

4 

6 

8 

10 

/ 

1 

4 

6 

4 

1 


12. Show that (6.5) may be written ^ 

a. = [iV2Zx» - 

(probably the most convenient formula for machine computation w^ith ungrouped data). 
18, For a set of ungrouped values the following sums are found : 

N « 16, = 480, X)x» = 15,736 

Find the mean and the standard deviation. 

14. If only two values, Xi and X 2 , are obtained for the variate x, and if 5 = (xi -f Xj)/2, 
show that 

{Xi - 2)* -f ix2 - 2)* * (Xi - X2)V2 

and that consequently 

Sz = |Xi - X2I/2. 

16. Verify the identity 

8 

3(xi - x,)> + (xi+Xi- 2x,)* = esc*. - *)» 

<-l 

and so obtain an expression for the variance of three values Xi, X2 and Xj, with mean S. 

IB. Given the following information about two sets of data: 

I ni = 20, *1 = 25, 6 

II n2 = 30, 3^2 = 20, 82* 4 

Find the mean and variance of the composite set. 

17. For a group of 50 boys the mean score and standard deviation of scores on a test 
are 69.5 and 8.38. For a group of 40 girls the mean and standard deviation are 64.0 and 
8.23 on the same test. Find the mean and standard deviation for the combined group of 
90 children. 

18. Calculate the mean and standard deviation of the first 26 positive integers 

19 . Prove that the variance of the first N positive integers is {N* — 1)/12. 
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20 . For eight related distributions the following values are obtained : 


Distribution 

1 

2 

3 

4 

5 

6 

7 

S 

Frequency 

7 

14 

32 

49 

55 

54 

35 

14 

Mean 

67 86 

72 14 

81.87 

84 80 

85 73 

90 92 

95 57 

106.0 

Variance 

106.1 

101 8 

24t) 5 

283 6 

257 6 

294 5 

222 5 

71 43 


Find the mean and variance of the whole distribution formed by the combination of these 
eight. Ans, 87.31, 303.1. 

Hint. Use equations (4.8) and (6.16). 

21 . Show that (4.8) and (6.15) may be written 

k 

1-2 

Si’ = [N(x^ + S‘, - + Sj»)]/n, - S,» 

(These forms are required in the next exercise ) 

22. In a certain distribution of N = 25 measurements, it was found that f = 56 inches 
and 8 = 2 inches. After these results were computed it was discovered that a mistake had 
been made in one of the measurements whicli was recorded as 64 inches Find the mean 
and standard deviation if the incorrect variate, 64, is omitted. 

Hint. Let rii = 24, r /2 = 1. Then =- 64 and S 2 = 0. To find and si use formulas 
in Exercise 21 above. 

23 . Jf two or more vaiiatcs are deleted from a distribution for which A' , 5 , and s are given, 
show how to compute the mean and variance of the remaining variates. 

124. Consider a composite set consisting of k subsets and let and ru denote, respec- 

k 

tively, the variance and number of variates m the ith subset, and N « ^nx 

1 

(a) If the subsets have equal means, show^ that the variance of the composite set is 
given by 



(6) If the subsets each contain the same number of variates and have equal means, 
show that 



CHAPTER VII 

MOMENTS. SKEWNESS AND KURTOSIS 

7.1 Populations and Samples. One of the general problems of statistics 
is to summarize and characterize data. In the words of R. A. Fisher, 

^^A quantity of data which by its mere bulk may be incapable of entering 
the mind is to be replaced by relatively few quantities which shall adequately 
represent the whole, or which, in other words, shall contain as much as 
possible, ideally the whole, of the relevant information contained in the 
original data.’^* 

Among these ^ ^relatively few quantities'' are those which are known as 
moments. Two of them, the arithmetic mean and the variance, have already 
been discussed and two others are in fairly common use, but a whole series 
of moments can be defined. The higher moments are principally used to 
characterize populations rather than samples, and therefore it will be well to 
clarify the distinction between a sample and the population from which it 
is drawn. 

There are three kinds of population. The first includes finite and actually 
existing populations which, although large, can be enumerated if necessary. 
Examples are the total numt^er of persons living in the United States or the 
total number of peach farms in the state of Georgia. Since complete censuses 
are time-consuming and costly, it is usual to obtain information about such 
populations by investigating samples w^hich are more or less randomly chosen. 

The second kind of population is a generalization from experience and is 
indefinitely large, such as, for instance, the total number of throws that might 
conceivably be made in unlimited time with a particular pair of dice. Any 
actual set of throws, however numerous, can be regarded as a sample from 
this practically infinite population. A third kind is a purely hypothetical 
population which can be completely described mathematically. That is, the 
distribution of values in the population is given by a mathematical formula. 
This type of population is used in the process known as curve-fitting or gradua’- 
lion, in which an actually occurring distribution is replaced for the purposes 
of further discussion by a mathematically described distribution which seems 
to have similar characteristics. If the ''fit" is satisfactory, as judged by a 
test which we shall describe later, we can regard the observed sample as 
coming from a population which has the characteristics of the mathematical 

* See Reference 1. 

90 
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distribution. In the most common method of curve-fitting, moments play 
an important part. 

7.2 Moments about the Origin. The variate x with which we are con- 
cerned may be discrete or continuous. If discrete, x may take values xi^xt, • • • 
with frequencies /i, / 2 , • * • and a total population frequency M\ if continuous, 
Xi, X 2 , • * * are the class marks of classes with corresponding frequencies. When 
the population is indefinitely large, it does not make sense to speak of the 
frequencies /♦, but we can still speak of the proportion of values p, lying in 
the ith class. Similarly, as we shall see in the next chapter, when the popula- 
tion is described by a continuous mathematical function we can talk of the 
proportion lying between any two given values of x, Xi and X 2 . For a finite 
population, p» = fJM and when M is large this proportion can be regarded 
as the probability (see Chapter IX) that a value selected at random from the 
population will lie in the zth class. We shall define our moments first for a 
finite population. The rth moment about the origin of x is given by 

(7.1) M/ = r = 0,l,2,3--- 

where 

For a practically infinite population we write /, = Mp, and so obtain 

(7.2) Hr' = Sp.x.'' 

f 

For r ■= 0, i/ = 1, and so 

(7.3) ho' = 1 


For r *= 1, we obtain 

(7.4) 


Pi' = or 


which is simply the arithmetic mean. For a population we shall denote the 
mean by 

For a sample of iV, the moments about the origin may be defined by 

(7.5) = r = 0, 1,2-" 

where £/< ■= N. 

The first moment nti' is then the arithmetic mean 2 of the sample. Simi- 
larly m*' «= ^ the arithmetio mean of squares of the values xt, and 

so for moments of higher order. 
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The term “moment” has its origin in mechanics where we speak of the 
“moment of a force.” Suppose we have a rigid bar, called a lever, with one 



Fig. 22 


tances xi, X 2 , Xky respectively 
forces is 

J\X\ + StX% + 


point of support known as a fulcrum 
(Fig. 22). If a force /i is applied to 
the lever at a distance Xi from tlie ful- 
crum 0, the product xiji is called the 
moment of the force. If there are two 
or more such forces /i, /a, * • -A, acting 
in the same direction, and at the dis- 
from 0, the total moment of all these 

• * + Jk^k == 


If the distances x are squared, we have YlSxX% as the total second moment, 
and represents the rth moment. 

We have seen that in calculating the arithmetic mean and the standard 
deviation for a continuous variable, grouped in classes, it is convenient to 
change the variate from x to where 

w = (x Xo)/c 

In the same way, in calculating moments of any order, it is often convenient 
to obtain first the moments in terms of w, namely, 

(7.6) TOr,u' = ^ r = 0, 1, 2- • • 

Here we are using two subscripts on m, the first denoting the order of the 
moment and the second the variate. The second subscript can be omitted 
when there is no ambiguity. 

7.3 Moments about the Mean. The most important set of moments in 
statistical theory is obtained by shifting the origin to the arithmetic mean. 
For a population of Af, we have 

(7.7) nY, ^ = 0,1,2- •• 
or, when M is infinite, 

( 7 . 8 ) Hr — — hY 


For r = 0 we have 
(7.9) 


Mo = 1 
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For r » 1, 


Ml 


m) 


(7.10) = ^ ^ = 0 

For r = 2, 

Mi ~ ^ /*)* 

which is the population variance and is commonly denoted by <r*. 
For a sample of N, the corresponding moments are 

(7.11) ^ 5Z/.(a:. - x)' 


and here again Wo = 1, mi — 0, and m 2 is the sample variance. Analogous 
formulas hold for the moments mr,u in terms of u. 

If we think of weights proportional to the frequencies in each class suspended 
along a horizontal bar at the class marks, the bar will balance at its center of 
gravity which is at the point x, equal to mi'. Also if the bar is rotated about 
its center of gravity, its radius of gyration is the square root of the second 
moment m 2 ,, and so is equal to the standard deviation of the distribution. 
There is no simple mechanical analogy for the higher moments. 

7,4 Relations between the m/ and the Even in the u variate, the 
m, are troublesome to calculate directly from the definition, because U will 
usually have to be taken out to several figures, and then u u has to be 
squared, cubed, or raised to even higher powers. Therefore, instead of com- 
puting the mr directly we first find the much simpler m/. By using the 
Binomial Theorem of algebra for a positive integral index, it is easy to obtain 
a series of expressions for mr in terms of m/. If we work in the u variate 
(and drop the subscript w), we have, for example, 

Tth = 

= m*' — 2C»ii' + S* 


But mj' >- iZ, 80 that 
(7.12) 


mt « mt' — (mi')* 
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The student shoiild be able to prove, by writing out the expanded forms of 
(«< — U)* and (Ui — U)*, that 

(7.13) mi = OTj' — Smi'mi' + 2(mi')* 
and 

(7.14) mi = mi' — + 6wii'(»ii')* — 3 (mi')* 

Having obtained the moments in terms of u, it is merely necessary to multi- 
ply by the appropriate power of c to get the moments in terms of x. Thus, 

^ - *)* 

= ^ ^fiixo + cu. — xo — day 

by (4.3) and (4.4), 

(7.15) = chrhju 
It is 6843^ to show similarly that 

(7.16) 

These moments are therefore not affected by a change of origin, but only by 
a change of scale. 

Similar relations hold for the population moments, for example, 

(7.17) M2 « M2' - (miO^ 

(7.18) M« “ Ms' ~ 3 m2'mi' + 2(mi')* 

(7.19) M4 = Ms' — 4 m8'mi' + 6m2'(mi')® — 3(mi')* 

7.6 Calculation of Third and Fourth Moments. The method is a direct 
extension of that already used for the mean and the variance. The details 
of the calculation for the grades of Table 2 are set out in Table 26. The 
column fu^ is foimd by multiplying /u by u, fu^ by multiplying fu^ by u, and 
so on. The last column is used for Charlier^s cheeky which should be applied 
as soon as the other columns have been computed and added. The check 
depends on the identity: 

(7.20) 2:/(u + D* = + 4E/«’ + OE/u* + 4E/W + E/ 

The value u -h 1 for any row except the last is the value of u in the next 
following row. The numbers are raised to the fourth power and multiplied 
by /. For the data in Table 26, E/(« + 1)* = 1220, and the sum of terms 
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Data 

Computations 

X 


U j 

fu ; 

M 



/(M + D* 

34 5 

2 

-4 

-8 

32 

-128 

512 

162 

44.6 

3 

-3 

- 9 

27 

- 81 

243 

48 

64.6 

11 

-2 

-22 

44 

- 88 

176 

11 

64.6 

20 

-1 

-20 

20 

- 20 

20 

0 

74.6 

32 

0 

0 

0 

0 

0 

32 

84.5 

25 

1 • 

25 

26 1 

25 

25 

400 

94.5 

7 

2 

14 

28 

66 

112 

667 

Sums 

100 


-20 

176 

-236 

1088 

1220 

— Sums 

B 


- 20 

1 76 

-2 36 

10.88 

For Charlier's 

N 

■ 


I 




check 


on the right-hand side of (7.20) is 1088 + 4(~236) + 6(176) +4(— 20) 
+ 100, which is also 1220. 

This check does not insure accuracy, because compensating errors might 
occur, but if the check is satisfied one feels confident to proceed. We divide 
the column sums by N to get the m'r,u, and then compute the mr,u by (7.12), 
(7.13), and (7.14). Thus, 

m2,„ = 1.76 - (-0.20)2 = 1,72 

m8,« « -2.36 - 3(1.76) (-0.20) + 2(-0.20)« = -1.32 

m4,u « 10.88 - 4(- 2.36) (-0.20) + 6(1.76) (-0.20)^ - 3(-0.20)^ « 9.41 

AlB a check on the calculation we may compute m' 4 ,« by the relation 

W'4 = ~ ^ ~ ^'i) + 

(7.21) « m4 + 4m8W'i + + {m\Y 

(We have dropped the subscript u'% for convenience of printing.) This 
check can be handled readily on a computing machine. 

7.6 Sheppard’s Correctioiis for Grouping Errors. As explained in §6.9, 
the variance may be corrected (under suitable conditions) for the error caused 
by grouping. In the calculation of the higher moments the same assumption 
is made that all the values in any class may be taken as equal to the class 
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mark, and this assumption produces an appreciable error in the moments of 
even order. It may be shown that the corrections to the moments, in terms 
of the u variate, are 

corrected m 2 — uncorrected mt — 1/12 

corrected rriA = uncorrected niA -- i (uncorrected m 2 ) + 7/240 

(1/12 - 0.08333, 7/240 = 0.02917) 


The third moment needs no correction. As applied to Table 26, 
corrected m 2 == 1.720 — 0.08^ = 1.637 

and 

corrected lUi = 9.410 — (1.720)/2 + 0.029 =» 8.579 

As remarked before, Sheppard’s corrections are valid only for hump-backed 
(‘‘bell-shaped”) distributions with flat tails. They are not applicable to 
the J-shaped or U-shaped types. Moreover, they ate a refinement which 
may not be consistent with the degree of accuracy in the original data or with 
the errors due to sampling fluctuation. 

7.7 Standard Units. For the purposes of theoretical statistics it is often 
very convenient to express deviations from the mean in terms of the standard 
deviation as unit. We shall denote deviations so expressed by z, ^here 

(7.22) 2 == (x - 2)/ 8* 


and say that they are in standard unitSy or standardized. 

The significant characteristic of the z variate is its independence of the unit 
in which the original measurements were taken. For example, suppose we 
were concerned with a set of lengths. One distribution of variates would 
result if the measurements were made in feet. In this case x', x, and 
would also be in feet. If the measurements were taken in inches, then x', 
and s* would be in inches, and each of these values would be, numerically, 
twelve times as large as the corresponding numbers in the first distribution. 
However, the variates expressed in standard units would be the same for the 
two distributions. Thus if 


and 


2 = 50 ft = 60(12) in 
8* *= 6 ft = 5(12) in 


then for an individual measurement of x « 60 ft » 60(12) in, we have 


10 ft ^ 10(12) in 
6 ft “ 6(12) in 


2 
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It is obvious, therefore, that standard units provide a basis for comparing 
distributions. 

With the aid of a computing machine, a distribution may easily be trans- 
formed into standard units by the so-called continuous process. To illus- 
trate, consider the data of Table 27, representing a distribution of weights 


Table 27. Standard Values 


X 

Class Mark (lb) 

f 

z 

29.5 

1 

-3 154 

33.5 

14 

-2.461 

37.5 

56 

-1 768 

41.5 

172 

-1.076 

45 5 

245 

-0 383 

49.5 

2f>3 

0 310 

53.5 

156 

1 002 

57.5 

67 

1.695 

61 5 

23 

2 388 

65 5 

3 

3 081 


(to the nearest pound) of 1000 8-year-old Glasgow schoolgirls. The mean 
and standard deviation are 47.712 lb and 5.774 lb, lespectivelv, so that 

X - 47.712 

2 = = 0.17318 X - 8.2627 

5.774 

Referring to the discussion of the continuous method given in the Introduc- 
tion, we observe that here k = —8.2627, n = 0.17318, and we desire the 
values of z corresponding to the values of x given in Table 27. For the values 
of X such that nx < we write the above relation in the form 

-2 = 8.2627 - 0.17318X 

The procedure* now is to register 8.262700 on the product register, punch the 
constant factor 0.17318 on the keyboard, and then by turning the crank 
backward so that the successive values of x appear on the revolution register, 
we subtract from k the products of this multiplier and the values of x. The 
various values of x are built over from one to another without clearing the 
dial. The resulting values of —2 are read at each stage from the product 
register until we get — 2 = 0.383. From here, nx > fc, so we clear the dials 
and start oyer using the original form of the relation between x and z. We now 

♦ If automatic machines are available the instructor will explain the procedure. 
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register —8.262700 on the product register by turning the crank backward, 
punch 0.17318 on the keyboard, and turn the crank forward to form the 
values of x on the revolution register. The values of z are read as before 
from the product register at each stage of the build-over process. In this 
way the set of standard values in Table 27 is obtained. We see from this table 
that a range of 2 = dr 3 takes in practically all the values. This is typical of 
the more common distributions. 

Some writers use X to denote the variate that we have called Xy and use x 
to mean X — X. In this notation z == x/s^. Occasionally in later chapters 
we shall find it convenient to designate deviations from the mean by x instead 
of x\ If so, we shall state that the origin of x is taken at x. 

In educational work it is sometimes advisable to standardize scores, espe- 
cially for the purpose of combining scores on different tests, which may show 
different degrees of variability. In order to avoid negative numbers, it is 
customary to multiply the z value b}^ 10 and add 50 and thus obtain the 
standard Z score, where Z = lOz + 50. (A 2 -value below —5 seldom 
occurs.) 

Example. (Walker). The raw scores obtained by pupils B, C, and D on three tests 
are given below. The means and standard deviations for these tests for the whole class are 
also given. Ckimpute a Z score for each pupil. 


Raw Score 


Pupil 

Test 1 

II 

III 

Composite Z 

A 

42 

92 

10 

62 

B 

40 

90 

14 

55 

C 

35 

93 

16 

69 

D 

45 

81 

18 

54 

Class 2 

31.2 

86.6 

14.7 


Class 8x 

11.5 

3.6 

2.4 



The z values for A on the three tests are 


42 - 31.2 


92 - 86.5 


1.528, and 


10 ~ 14.7 


— 1.968, respectively. The corresponding Z scores are 69.4, 65.3, and 30.4, 


the arithmetic mean of which is 51.7. This is rounded off to 52. The student should 
check the other values given, as an exercise. 


7.8 Moments in Standard Units. When expressed in standard units, 
population moments are usually denoted by the Greek letter a (alpha). 
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Thus 


Or M 


= ^ 2/* definition of 2 ,, 


~ a' M 


= by (7.7) 

oti = 0 

a2 = 1 

«8 = fxz/a^ 

«4 = M4/o^ 

The a^s are all pure numbers, independent of whatever unit x may be expressed 
in. Thus if x is in ft, ms is in ft^ but <r is also in ft, so that m and ct* are in 
the same units. 

In the notation used by Karl Pearson, 

( 7 * 25 ) = az^y 02 = 0:4 

and in that of R. A. Fisher 


(7.23) 
Hence 

(7.24) 


(7.26) 


7i = «8, 72 = a4 3 


Corresponding to the a^s we can define standardized moments for a sample, 
namely, 

(7.27) z. = {xi — x)/s, 

t 

or 


(7.28) 


Gr = mrlSx^ = mrjJSu 


However, if we wish to estimate from a sample the moments for the popula- 
tion it is better to use some slightly modified moments, known as k-statistics, 
7.9 The ^-statistics. A statistic is a quantity calculated from the observa- 
tions on a sample and used to estimate some characteristic of the parent 
population. These characteristics are usually parameters ^ that ib, they are 
unknown constants which appear in the equation of the frequency curve 
that is assumed to represent the distribution, but which vary from one dis- 
tribution of the same type to another. The population moments are param- 
eters which occur in the equations of various commonly used frequency 
curves. The corresponding moments for samples will be approximations to 
the population moments, but it has been shown that, for samples which are 
not large, better approximations to the most important moments are pro- 
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vided by the fc-statistics, 
fci ~ f = mi' 

h = Ns-^/(N - 1) = Nm2/(N - 1) 

kz = mmz/{N - l)(iV - 2) 

fc4 - A^KA' + l)m4 - 3(N - I)m22]/(A^ - 1)(A' - 2)(A^ ~ 3) 

These are estimates of mi', ms, and m 4 — 3^2^ respectively. They are 
said to be unbiased estimates because if one of them, say kzy were calculated 
for a large number of samples from a population with known moments, the 
mean of all the values of kz would be the true parameter nz. If N is very 
large, of course, factors like N/(N — 1) and N^/{N -- l)(N — 2) are 
practically equal to 1, and then k 2 is practically the same as m2, kz as mj, 
and ki as m4 — 3m2^. 

7.10 Skewness. The values of ai and ^2 tell us nothing about a popula- 
tion since ai = 0 and 0:2 = 1 for ail distributions. But az and depend on 
the shape of the frequency curve, and therefore can be used to distinguish 
between different types. Thus 

«» = 

is a measure of asymmetry about the mean, or skewness. If the values of 
X are distributed symmetrically about the mean, there will be for every posi- 
tive value of a: — M a corresponding negative value. When these are cubed 
they retain their signs and cancel on addition, so that ^3 = 0. But if the dis- 
tribution has a longer tail out to the right than to the left, the positive values 
of {x — /i)* usually outweigh the negative ones, so that az > 0. If the 
distribution has a longer tail to the left, as < 0. Since depends on the unit 
of X, and since the symmetry or lack of it is not a function of the unit of 
measurement, we divide /xa by 0^ to get the pure number az. Then a curve 
with as > 0 is said to have positive skewness, and one with as < 0 negative 
skewness. These cases, along with as = 0, are illustrated in Fig. 23. For 
most distributions as wdll lie between —2 and 2. 




Fia.23 
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With a sample, we can estiinate tiie skenniesa of the population by using 
the statistic 

(7.30) - kt/kt^ 

as an estimate of yi (•■ at). For large N, gi is practically the same as 
mijm^ — fn»/s*. 


Table 28. Three Distributions 



A 

B 

C 

u 

f 

s 

/ 

-3 

0 

1 

0 

-2 

3 

1 

1 

-1 

6 

5 

10 

0 

7 

11 

6 

1 

6 

6 

5 

2 

3 

1 

2 

3 

0 

1 

1 

Sums 

25 

25 

25 

1 


The data in Table 28 (simplified and adapted from actual experimental 
data) give frequencies corresponding to t^-values for three different samples 
of 25. For all three the mean is zero and the standard deviation in terms of 


w is 1.2, but for the first two the skew- 
ness is clearly 0 (from the obvious 
symmetry) and for the third it is 0.74. 
Histograms for these three distribu- 
tions are shown in Fig. 24. 

7.11 Other Measures of Skewness. 
For an unsymmetrical distribution the 
distance between the mean and mode 
may be used to measure the degree of 
asymmetry or skewness, because the 
mean and mode coincide in a symmetri- 
cal distribution. Since we wish any 
measure of skewness to be a pure num- 
ber, we express this distance in units of 
the standard deviation, thus (mean 
— mode)/<r. 

This measure was suggested by Karl 
Pearson. For a certain theoretical fre- 
quency curve, known as Type III of 
Pearson’s set of curves, the following 


A 

j 

1 

-h 

1 

1 

1 



-2 

B 

1 ! 

1 

■ 

■1 

1 

1 

1 

2 

1 ! 

-3 -2 

C 

-1 

0 

1 

1 

1 

-t- 

1 

1 

JL 

1 

2 3 


- 2-10123 
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relation can be proved mathematically (see Part Two, page 106) : 

(7.31) (mean — mode)/o- = azl2 

Because of this relation azl2 i? sometimes used, instead of as, as the measure 
of skewness. Equation (7.31) can also be used for finding approximately 
the mode of a moderately skew distribution. 

Another measure of skewness, suggested by Bowley, is based on the fact 
that for a positively skew distribution the third quartile is farther from the 
median than the first quartile, that is, Qz — Qi > Qz — Qi. The measure 
adopted, which is also a pure number, is } (Qa — Q 2 ) (Q 2 — Qi) 1/(03 — Oi) 
= (Qa + Qi — 2Q2)/(03 — Oi). This number is not as dependent as as 
on the tails of the distribution. 

7.12 Kurtosis. The fourth standardized moment, a 4 , is a measure of 
a property of the distribution called kurtosis. The name comes from a Greek 
word meaning ^^humped'^ and a high value of ^4 was thought to mean a sharply 
humped or peaked distribution and a low value of ^4 a relatively flat-topped 
distribution. It is now recognized that the shape of the hump has less to 
do with the value of a 4 than the length (and height) of the tails. Indeed, 
I. Kaplansky (Reference 2) has shown tin t kurtosis has not necessarily any- 
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thing to do with peakedness. A distribution with a perfectly flat top may 
have infinite kurtosis and one with an extremely sharp and high central peak 
may have a low value of a^, although these are artificially manufactured 
examples. For the so-called ‘‘normal curve’' ^4 == 3, so that 72 =* 0 (see 
equation (7.26)), and a curve with ^4 > 3 has po>sitive kurtosis while one 
with a4 < 3 has negative kurtosis. Fig. 25 shows three curves with identical 
values of and a 3, but differing in their values of ^4. 

For a sample f the kurtosis is measured by the statistic 

(7.32) g, - 


which is an estimate of 72. For the distributions A and B of Table 28, illus- 
trated in P"ig. 24, the calculated values of are —0.85 and 1.44 respectively. 
For any distribution (see Reference 3) 

(7.33) a4 > aa" + 1 


It may be sho\TO that for a large class of theoretical frequency curves (the 
Pearson system, see Part Two) the mode Mo is related to the mean m, the 
standard deviation a, the skewness 71, and the kurtosis 72, by the relation 


(7.34) 


Mo 


2 572 — 671* + 6 


which can bo used to calculate a value for the mode. 

7.13 Specimen Computation of Moments. The main characteristics of 
a sample distribution are summed up in the five quantities x, s*, gi and pa. 
In fitting any one of a considerable variety of theoretical curves to an empirical 
distribution, these quantities (or some of them) are used to estimate the pa- 
rameters of the curve. Table 29 shows a form of work-sheet which may be 
used. If the work is done on a computing machine, only the totals of columns 
4 to 8 need be recorded. 

After obtaining w, W2', mz, the next step is to calculate m2, ms, and 
m4 by equations (7.12) to (7.14). Then the A:-statistics are found by (7.29) 
and gi and g 2 by (7.30) and (7.32). All the calculations may be made in the 
w-variable, including the applying of Sheppard’s corrections to and m4. 
The variance in terms of x is, however, times m2 (in this example c » 1) 
and the mean in terms of a; is dZ + Xo. We have 


X = 0.443 -h 69.5 = 69.94 in. 

C* = 0.19625, 12® == 0.08694, 12® = 0.03851 
m, - ms' - C® - 9.901 - 0.19625 - 9.70475 
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Table 29 . Computation of Moments op Distribution op Span 
IN Inches Among Adult Males 


Xfi 

/ 

u 

uf 

u'f 

U^f 


(u + m 

58.5 

1 

~11 

- 11 

121 

- 1,331 

14,641 

10,000 

59 5 

2 

-10 

- 20 

200 

- 2,000 

20,000 

13,122 

60 5 

1 

- 9 

- 9 

81 

- 720 

6,561 

4,096 

61.5 

6 

- 8 

- 48 

384 

- 3,072 

24,576 

14,406 

62.5 

7 

- 7 

- 49 

343 

- 2,401 

16,807 

9,072 

63.5 

22 

- 6 

-132 

792 

- 4,752 

28,512 

13,750 

64 5 

55 

- 5 

-275 

1,375 

- 6,875 

34,375 

14,080 

65.5 

111 

- 4 

-444 

1,776 

- 7,104 

28,416 

8,991 

66 5 

146 

- 3 

-438 

1,341 

- 3,942 

11,826 

2,336 

67.5 

182 

- 2 

-364 

728 

- 1,456 

2,912 

182 

68.5 

229 

- 1 

-229 

229 

- 229 

229 

0 

69.5 

265 

0 

0 

0 

0 

0 

265 

70 5 

263 

1 

263 

263 

263 

263 

4,208 

71 5 

217 

2 

434 

868 

1,736 

3,472 

17,577 

72 5 

176 

3 

528 

1,584 

4,752 

14,256 

45,056 

73 5 

132 

4 

528 

2,112 

8,448 

33,792 

82,500 

74 5 

82 

5 

410 

2,050 

10,250 

51,250 

106,272 

75 5 

48 

6 

288 

1,728 

10,368 

62,208 

115,248 

76.5 

20 

7 

140 

980 

6,860 

48,020 

81,920 

77.5 

16 

8 

128 

1,024 

8,192 

65,536 

104,976 

78.5 

12 

9 

108 

972 

8,748 

78,732 

120,000 

79 5 

3 

10 

30 

300 

3,000 

30,000 

43,923 

80.5 

1 

11 

11 

121 

1,331 

14,641 

20,736 

81 5 

2 

12 

24 

288 

3,456 

41,472 

57,122 

82 5 

1 

13 

13 

169 

2,197 

28,561 

38,416 

Sums 

2,000 


886 

19,802 

35,710 

661,058 

928,254 

(SuinB)/N 



0.443 

9.901 

17.865 

330 529 





u 

m' 

m ,' 

TUi 



Charlier’s check: 

Z(u + ly/ - Lu‘/ + 4Zu‘/ + + iEu/ + Ef 

028,264 - 66J,068 + 4(36,710) + 6(19,802) + 4(886) + 2,000 


vn 
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Corrected mj «■ 9.6214 

«»* “ 6.6214, «, = 3.102 in. 
jw» TOj' — Zmi'U + 2C* 

= 17.856 - 3 (9.901) (0.443) + 2(0.08694) 

- 4.8704 

Mi “ WI 4 ' — 4?raa'S + Qnii'fP — 3fZ^ 

-= 330.529 - 4(17.855) (0.443) + 6(9.901) (0.19625) 

- 3(0.03851) 

- 310.432 

Corrected ma = 310.432 - 4.852 + 0.029 = 305.609 

S (9-6214) = 9.6262, (fc*)^ = 3.103 

kt - 4.8777, ki = 311.519 - 278.131 * 33.39 
= 4.878/29.87 = 0.16, gt = 33.39/92.66 » 0.36 

The estimated mean, standard deviation, skewness, and kurtosis for the popu- 
lation of spans are therefore 69.94 in, 3.10 in, 0.16, and 0.36, respectively. 


Exercises 

1. Calculate m/, mt, and mt for the following distributions: 


(a) (b) 


z 

f 

X 

/ 

0 

1 

-3 

1 

1 

3 

~2 

3 

2 

5 

-1 

5 

3 


0 

6 

4 

6 

1 

3 


2 

2 

IHHIi 


2 . Prove the relations (7.13) and (7.14). Show that these also hold if the moments 
are expressed in the a;-variate. 

3. Prove equation (7.16). 

4 . Show that for any distribution expressed in standard units, the mean is 0 and the 
standard deviation is 1. 

3. Show that the moments irtr are unaffected by a change of origin for the variable x 
and the moments Or are also tmaffected by a change of unit. 

3 . Calculate mt, and nu for the distribution of monthly rainfall, Iowa City (Table 6, 
|1.8) using the scheme of Table 29. 
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7. Calculate the AnstatieticB, and g\ and for the data of Exercise 6, according to the 
method of §7.13« 

8. Verify the values given in §§7.10 and 7.12 for the skewness of distribution C and 
the kurtosis of distributions A and /?, m Table 28. 

9. The mean of scores for a group of students on a certain test was 63.7 with a standard 
deviation of 12.3. Find the Z scores for the top student, with a score of 98, and the bottom 
student, with a score of 21. Ans. 78, 15. 

10. For a class of 35 students the sum of scores on a test was 2118 and the sum of 
squares ef scores 131,327. Find the Z scores corresponding to raw scores of (a) 60 and 
(b) 80. Ana. (a) 39; (b) 71. 
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CHAPTER VIII 

THE NORMAL CURVE 

8«1 Frequency Curves. As we have pointed out in Chapter VII, various 
distributions encountered in practice can be more or less closely approximated 
by theoretical frequency curves. A complete discussion of such curves 
involves rather advanced mathematics, and fuller details will be found in 
Part Two (Chapter V). However, some simple ideas relating to frequency 
curves will be useful in our work. 

The curves we shall encounter will be continuous curves, specified by an 
explicit mathematical equation of the form y ^ f{x)^ where f{x) is never 
negative. The domain of x is an interval of the axis, sometimes the whole 
axis from ~ oo to + oo , but whether or not the curve stretches out to infinity 
the area between a frequency curve and the axis is always finite. 

If a frequency curve is fitted to the histogram of a distribution with total 
frequency N, the area under the curve represents this frequency. The partial 
area under the curve between ordinates erected at x *= a and x « fc (Fig. 26) 



represents the number of observations (the partial frequency) corresponding 
to a value of x between a and 6. It is often convenient to consider the whole 
area under a frequency curve as unity, and then the area between x = a and 
X »= 6 represents the 'proportion of observed values of x lying between a and 5. 
This can be interpreted as meaning the probability that a randomly selected 
observation from the population represented by the curve will have a value of 
X between these limits ; hence a frequency curve with total area unity is some- 
tiines called a probability curve. 

If the domain of x ranges from h to the total area is denoted mathe- 
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owtioally by the symbol 

f fix) dx 
J It 

which is read as “the integral of f(x) from h to Z*.” It is proved in textbooks 
on the calculus that this integral is the limit of a sum, 

(8.1) f fix) d* = lim '^fix,) Ax^ 

v Axj 0 

In this sum, the AXj are sub-intervals of the x-axis which together make 
up the whole interval from h to h, and f{xj) is the value of f(x) corresponding 
to a point x, in the jth interval Ax,. Geometrically speaking, /(x;) Ax, is 
the area of a rectangle with base Axj and height /(x,), and the sum is the total 
area of a histogram with frequencies /(x,) and class intervals Ax^, as illus- 
trated in Fig. 27. (The Axj need not be equal to each other.) The limit in 



Fig. 27. Illustrating Definition of Integral 

(8.1) means that all the sub-intervals are to be thought of as becoming smaller 
and smaller, ultimately all tending to zero. The limit may not exist, but if 
it does it is taken as the definition of the area under the curve y *= /(x) 
between li and k* 

For frequency curves of the type we are considering the limit does exist, 
and the area under the curve is finite. The function /(x) is said to be integrable. 

The integral sign is merely a conventionalized S, standing for sum. 

The area between the ordinates at x « a and x » 6 is similarly denoted by 
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J' f{x) dx, which, if J ^ f{x) dx = 1, represents the proportion of observa- 

tions having a value of x such that a < x < b. 

8*2 The Normal Curve. Perhaps the most important of all frequency 
curves is the so-called normal* curve whose equation may be written 

(8.2) y - « < x < <» 

where K, and m represent numbers whose significance will be explained 
presently and e is the number 2,71828- • • which is the base of natural loga- 
rithms. The curve is bell-shaped and is symmetrical about the line x = m. 
It was first discovered by A. De Moivre (1667-1754), a French mathematician 
who spent 66 years of his life in England, and it was published in 1733 in a 
privately printed pamphlet, now very rare. He obtained it while working 
on certain problems in games of chance which were proposed to him by the 
gamblers of his day. Because of this origin and because the data from certain 
coin- and dice-throwing experiments closely approach it in form, it is often 
called the normal probability curve. Actual statistical use of the normal 
curve began with the work of the famous mathematical astronomers, Laplace 
(1749-1827) and Gauss (1777-1855), each of whom derived it independently 
and presumably without knowing of De Moivre’s treatment. t They found 
that it repr(\s(‘rited very well the errors of observation in the physical sciences. 
For tliis reason it has been called the normal curve of error, W'^here error is 
used in the sense of a deviation from the true value. Since that time experi- 
ence has showm that it serves quite well to describe many of the distributions 
which arise in the fields of biology, education, and sociology. Much of the 
theory of statistics is built around it. 

The moments of a theoretical distribution specified by a frequency curve 


y ~ /(^) can be defined by integrals. 

Thus, 


(8.3) 

Ml' = A" = / x}{x) dx 

J 1, 


JU 2 = (T* = r (x — nYSix) dx 

(8.4) 

J f, 

M8 = J'^ (a: — M)*/(a;) dx, etc. 

where it is assumed that f f(x) dx — 1. 


The curve represented by (8.2) approaches the x-axis at both ends without 

* The term “normar^ used here should not be interpreted to mean that other types of 
distribution are abnormal. 

t For a more extensive history, see Reference 1, page 123, and Reference 16 of §0*4. 
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ever quite reaching 
If the constant K 
are then calculated 


it, 80 that h and lt&v& — « and + , respectively. 

/ QO 

f{x)dx^l, and if the moments 

00 

by (8.3) and (8.4), it turns out that 


(8.5) 


’ K = /i/(v)^ 

M = W 

= l/(2fe2) 

Ms ~ 0 

M4 - Vim 


Equation (8.2) can therefore be written 


The quantities m and a are parameters. They determine the position of 
the curve along the a:'-axis and the steepness of its sides, but do not affect its 
general shape and character. 

8.8 Standard Form. The parameters m and a may be removed from the 
equation of the curve by expressing it in terms of the standardized variable 

(8.7) z = (x - m)A 


When this transformation is made, and the constant is adjusted so that the 
total area under the z-curve is unity, the equation of the normal curve becomes 

(8.8) <t>(z) = (2jr)-^e-'’/' 


This is called the standard form of the equation. A variate z which is dis- 
tributed in accordance with Eq. (8.8) is said to he normally distributed with mean 
zero and standard deviation unity. This is often abbreviated os z = JV(0, 1). 

8.4 Tables of Ordinates and Areas. One of the reasons for writing the 
equation in standard form is that the ordinates and areas may be tabulated 
once and for all. These tables are given in the Appendix. We see from 

(8.8) that ^>(—z) == that is, the ordinates for negative values of z 

are the same as for the corresponding positive values of z, and the curve is 
symmetrical about the ordinate at z = 0. Therefore, it is not necessary to 
tabulate <^(z) for negative values of z. 

The general shape of the normal curve may be seen from the curve (1) 
of Fig. 25 (§7.12). It approaches the horizontal axis asymptotically at each 
extremity, never quite reaching it no matter how far extended. AJthou^ 
an actual sample will always have a finite range it is often convenient to think 
of the range in the parent population as infinite and in fact this infinity 
leads to remarkable simplifications in more advanced mathematical statistics. 
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Moreover, even in representing observed distributions the infinite range 
causes no practical difficulty because the curve comes down to the horizontal 
axis very rapidly beyond z = ±3. The combined area at each extreniity 
beyond t = =i:3 is only 0.27 of 1 % of the total area under the curve. 

Table I of the Appendix gives also the areas under the standard normal 
curve from z == 0 to selected positive values of z, as far as z = 4. Thus, the 
area from z = 0 to z = 1 is 0.3413. From the symmetry of the curve the 
area from to 0 is the same as the area from 0 to 1. Any other areas 
required may be found by appropriate addition or subtraction of tabular 
values, remembering that the whole area under the curve from — oo to + oo 
is 1. For example, suppose we require the area of the ^^tail’^ of the curve 

below z == — 2 (see Fig. 28) which is denoted by f ‘ <ti(z) dz. The area from 



Fig. 28 


— QO to —2 is 0.5 minus the area from —2 to 0, but this latter is the same as 
the area from 0 to 2, which is 0.4772. That is, 


£ 


<i,{z) dz = 0.5 - 0.4772 = 0.0228 


Note that \ <j){z) dz is a cumulahvc relative frequency. It denotes the frac- 

—00 

tion of the total population with a value of z less than t. It is easy to see 
from a figure that 


(8.9) 


/ <^(z) dz = 0.5 + f <t>{z) dz, or 0.5 — T 0(z) dz 

-00 0 V 0 


according as t is positive or negative. The quantity on the left of (8.9) is 
often denoted by $(0, ^ being the Greek capital phi. 

For decimal values of z between the hundredths given in the table, ordinary 
linear interpolation will suffice. 

8.6 Properties of the Normal Curve. The following properties can be 
established from the definition (8.8) with the help of calculus, but must for 
the most part be taken for granted at the level of the present course: 

1. The mean, median, and mode coincide at z «= 0. The height of the 
maximum ordinate is 0(0) * l/(2ir)^ = 0.3989. 
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2. The curve is convex to the z-axis for | z | > 1 and concave to the z-axis 
for j z I < 1. The points on the curve for which z = d= 1 are called poivis of 
inflection j and their position is important in making an accurate drawing of 
the curve. 

3. The standard deviation is 1, and the mean absolute deviation is 
(2/x)^ - 0.798. The quartiles are equidistant from 2 ~ 0, and therefore 
the quartile deviation is the value of t for which 



dz ~ 0.25 


From the tables this is 0.6745. The proportion of values of z lying between 
— 0.6745 and + 0.6745 is 0.5000. The number 0.6745 is often, although 
rather ambiguously, called the “probable error"' of z. 

4. All the standardized moments a, with r odd are zero. The even moments 
are ^2 = 1, = 1*3, as = 1*3*5, as = 1*3* 5*7, etc. 

For any other normal curve with area AT, mean fx, and standard deviation <r 
we convert from z and <t>(z) to x and y by the relations 

[ X = /i + 0*2 

U = N4>{zy<r 


The percentage distribution of area under the normal curve is indicated 
approximately in Fig. 29, where distances along the horizontal axis are given 
in units of < 7 . The same thing is shown in Fig. 30 with distances in units of Q 
(the quartile deviation). 




Fig 29 


Fig. 30 


8.6 Fitting a Normal Curve to a Distribution. A set of data as collected 
and tabulated usually relates to a sample of N individuals from a finite or 
infinite population. Other random samples of N from the same population 
would yield different frequency distributions, but if the sample is fairly large 
nearly all these distributions would be much alike. If the sample distribu- 
tion appears to be reasonably symmetrical, bell-shaped, and tapering off 
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gradually at both ends, it may be worth while to try whether it can be fitted 
satisfactorily with a normal curve. The theoretical curve idealizes the 
observational data, smoothing out the irregularities due to sampling fluctu- 
ation. Furthermore, if the fit is good, the mathematical statistician can 
proceed to deduce various results about the behavior of samples from a normal 
parent population and can feel confident that his assumptions apply reason- 
ably well to the actual population sampled. 

In fitting equations (8.10) to a given distribution we assume that: 

1. The area under the curve is equal to the area of the histogram (that is, N). 

2. The mean and variance of the normal curve are unbiased estimates of 
the population mean and variance based on the corresponding statistics of 
the sample. These unbiased estimates are furnished by the fc-statistics 
(§7.9). The estimate of /x is ki or x (the sample mean), and the estimate of 
<j^ is ^ 2 , or Nsx^/(N — 1). 

The procedure of fitting a normal curve to an observed distribution will 
now be illustrated with the data of Table 27, §7.7, referring to weights of 
1000 Glasgow schoolgirls. The calculated values of h and fc 2 in the original 
units are 47.712 and 33.344, respectively, so that we may take as parameters 
of the normal curve: 


'N = 1000 
^ M = 47.7121b 
. (7 = (33.344)^ = 5.7744 lb 

We now calculate standardized z values corresponding to selected values of x, 
by putting 

T ^ 47 712 

^ « 0.17318a: - 8.2627 

5.7744 

Appropriate values of x are the end values x, and the class marks Xo. For 
the purpose of testing the fit, the end values of z will be required and these are 
given in the accompanying Table 30. The values corresponding to the class 
marks merely provide additional points for plotting the curve, and these 
have already been given in Table 27. 

The ordinates of the normal curve at the given x-values are then obtained 
from ^ 


y « * 173.18 </»(z) J 

a 

using the values of <l>(z) corresponding to the calculated z, ^ obtained from 
Table I in the Appendix. A smooth curve may then be drawn through the- 
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Tablb 30. Ohdinates of Fitted Normal Cmmc 


z « 0.17318 X, -8.2627 
y « 173.180(z) 


X. (lb) 

z 

*(«) 

y 

f/c 

31.5 

-2.808 

0.00774 

1.34 

0.26 

36 6 

-2.116 

.0426 

7.38 

3.60 

39.6 

-1.422 

.1461 

25.13 

14.00 

43.6 

-0.729 

.3058 

52.96 

43.00 

47.6 

-0.0367 

.3987 

69.05 

61.26 

61.6 

0.666 

.3217 

66.71 

66.76 

65.6 

1.349 

.1606 

27.81 

39.00 

69.6 

2.042 

.0496 

8.59 

16.76 

63.6 

2.734 

.00961 

1.65 

5.76 

67.6 

3.427 

.00112 

0.19 

0.76 


plotted points (Xj y). Note that the mode of the curve should be at 
X « 47.712 and that the curve should be symmetrical about the ordinate 
through the mode. 

After the curve has been drawn, the histogram for the observed data may 
be constructed. Since the class interval is here 4 lb, the heights of the rec- 
tangles are the frequencies divided by 4, as given in the column f/c. The 
values of Xe are the ends of the bases of the corresponding rectangles. The 
completed diagram is drawn in Fig. 31. 

8.7 Graduation. The areas under the fitted curve and over the class 
intervals are called theoretical frequencies. Thus in Fig. 31 the shaded area 
represents the theoretical frequency corresponding to the observed frequency 
which is represented by the rectangle the midpoint of whose base is 4,1.5 lb. 
The determination of the theoretical frequencies is called “graduation by the 
normal curve.” It is a process of smoothing out the data to fit the curve. 

In order to enter a standard table of areas the values must be changed 
into z values. This has already been done in Table 30. For each of these z 
values, we read off from Table I in the Appendix the area A under the standard 
normal curve from — oo up to the calculated z. For positive values of z, 


So 


(fe + 0.5 


and for negative values of z. 


A » 0.5 - 


dz 


/•2.m 

For example, when 2 « — 2.115, we find that J ^(z) dz * 0.4828, so that 
A •* 0.0172. These values of A represent relative cumulative frequencies. 



Sec. 8.7 


Graduation 


115 


By differencing them (see §1.11) we get the relative frequencies AA corre- 
sponding to the various class intervals, as set out in Table 31, The first class, 
however, extends from « = — oo up to z = —2.808, and the last class from 
z « 2.734 up to z = 00 . The absolute frequencies are found by multiplying 
AA hy N 1000). These values of the theoretical or calculated frequencies 
fc may be compared with the corresponding observed frequencies /o, which 
are repeated in the last column but one of Table 31. It will be seen that 
there is a general similarity, although it would be hard to say whether or not 
the agreement is satisfactory. A method of judging this agreement will be 
given in a later chapter. 
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Table 31. Areas op Fitted Normal Curve 


x.m 

z 

A 

AA 

NAA - f. 

/. 

'F<m 


— 00 

0 




0 

31.5 

-2.808 

0.0026 

0.0025 

2,5 

1 

0.001 

35.6 

-2.115 

.0172 

.0147 

14.7 

14 

.016 

39.6 

-1.422 

.0775 

.0603 

60 3 

66 

.071 

43.6 

-0.729 

.2330 

.1565 

156.6 

172 

.243 

47.5 

-0.0367 

.4854 

.2524 

252.4 

245 

26^ ^ 

.488 

61.5 

0.656 

.7441 

.2587 

258.7 

.761 

55.6 

1.349 

.9113 

.1672 

167.2 

156*' 

.907 

69.6 

2.042 

.9794 

.0681 

68.1 

67; 

.974 

63.6 

2.734 

.9969 

.0175 

17.5 

23 J 

.997 


00 

1.0000 

.0031 

3.1 

1000.0 

3 

1000 

1.000 


8.8 Justification for Using the Normal Curve. The theoretical frequency 
curve has the same total area, the same mean, and the same standard devia- 
tion as the observed distribution. These conditions were, in fact, imposed 
in the process of graduation. Furthermore, if the fitting is justifiable, the 
skewness and kurtosis of the frequency curve should not differ appreciably 
from those of the distribution itself. In the example which we have carried 
through in detail, we find that the estimated skewness of the population is 
gi = 0.115 and the estimated kurtosis is g 2 = —0.104. For a normal curve 
these values should be zero. The question arises whether these observed 
statistics are sufficiently near to the theoretical values to justify us in using 
the normal curve to graduate the data. It is proved in Part Two that, for 
a large value of N and for a normal parent population, the variance of gi 
among random samples of the same size is approximately G/N and the vari- 
ance of g 2 is approximately 24i/N. For N = 1000, this means that the stand- 
ard deviation of gi among samples is about (O.OOG)*'^ = 0.077 and that of gz 
is (0.024)^ = 0.155. Now the observed gi differs from 0 by about IJ times 
its standard deviation and the observed gz differs from 0 by about S of its 
standard deviation, and these discrepancies are quite compatible with the 
assumption that the true values of gi and gz are zero. The fraction of the area 
under a normal curve outside the range of Ij standard deviations from the 
mean is about 4, which means that, if the values of gi are distributed approx- 
imately normally, there is a probability of i of getting a value of gi at least as 
different from zero as 0.115. A probability of i is not so small that we need 
reject the assumption that our sample comes from a normal distribution. 
(We should need a probability at least as small as and perhaps even as small 
as xuu to do this.) With the kurtosis the argument is still stronger. 

If we plot the values of A in Table 31 against the corresponding values of 
X*, we get a theoretical relative cumulative frequency curve, or ogive, the 
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chaxacteristic shape of which is shown in Fig. 32. We can plot the observed 
relative cumulative frequencies F</iV, given in the last column of Table 31, 
against the same values of x, and note how they will lie on the curve. The 
agreement can be judged more easily, however, if the scale of the graph paper 



Fia. 32. Ogive Fitted to Cumulative Fbequbnct Distbibution 

is SO adjusted that the ogive becomes a straight line. Imagine the vertical 
scale so stretched out in the neighborhood of A = 0 and A = 1 compared 
with the scale near A == 0.5 that the ogive is pulled out into a straight line. 
This is in effect what is done in the so-called ‘'probability graph paper, 
which is illustrated in Fig. 33. This paper is readily obtainable* and is con- 
venient for many purposes. 

The plotted points in Fig. 33 are seen to lie fairly well on a straight line, 
which indicates that a normal curve may be expected to fit. Discrepancies 
near the ends of the distribution may be ignored. By drawing in the straight 

• The Codex Book Company, New York. 
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Purpose of Graduating a Curve 

line one may make a quick rough estimate of the median, quartiles, etc., for 
the distribution, and estimate the theoretical frequencies lying between given 
values of X. For a discussion of a different type of probability paper see 
Reference 2. 

8.9 Purpose of Graduating a Curve. Since the graduating curve is char- 
acterized by practically the same sot of moments as the observed distribution, 
one may wonder what is the purpose of graduation. The following quotation 
from Prof. B. H. Camp illustrates this point (see Reference 3). 

There are three main reasons why a student should be taught to graduate a curve. The 
first, and least important, has to do with the use of a smooth curve m place of a jagged 
sample. The second, and most important, is that it is necessary for the mathematical 
development of statistics that the mathematician should be told what assuniptions he may 
make. These usually depend on the typos of fiequency cuives which can be depended on 
to fit phenomena. • • • A third reason, intermediate in importance between the other two, 
is that in testing a prion theories in various fields, it is often necessary to test the efficacy 
of the frequency distributions which are results of tliese theories. 

The second and third of Prof. Camp's rt^asons are not very easy to und'^r- 
stand at the level of this book. In the theory of sampling (see Part Two) 
it is necessary^ to make assuniptions about the parent population, and the 
mathematician naturally chooses for investigation a parent population which 
can be represented by a tractable mathematical functioiu Of all the curves 
which might be taken to represent reasonable frequency distributions, the 
normal curve has the simplest mathematical properties. 

The first reason is more readily understood. Occasionally in practical 
problems it may be desirable to use the theoretical frequencies obtained by 
graduation m place of the observed data which probably contain irregularities 
due in part to grouping, in part to sampling fluctuations. We cite here two 
illustrations. 

Example 1 A company which operates a chain of men’s haberdashery stores planned 
to bring out a new line of about ie0,000 lightweight sport shirts suitable for camping, 
hunting, etc. The question arose as to the determination of the number of each size that 
should be ordered from the factory. Their previous distribution of sizes had not been 
satisfactory because the demand for certain sizes had been dilTerent from the number 
manufactured. Therefore the statistical department was requested to recommend the 
distribution of the proposed order according to neck sizes. The solution of the problem 
hinged upon the availability of data giving the measurements of neck circui iferencee of a 
large sample of men. Satisfactory data w ere found in the ^‘Reports of the Medical Depart- 
ment of the United States Army in the World 'War,'’ w hich gave a table of the neck measure- 
ments in centimeters of 95,102 white troops at demobilization. Since these data are tabu- 
lated in class intervals which are slightly different from the ranges used in standard shirt- 
band sizes, a slight adjustment was necessary. But essentially a normal curve was fitted 
to this distribution and the graduated frequencies w ere taken as the number of potential 
customers for each shirt size. The result was quite satisfactory. 

Example 2. A well known and interesting illustration of the desirability of smoothing 
occurs in the census returns. The census takers’ records show" more persons alive at age 30 
than at age 29, more at age 36 than at age 34, pore at 40 than at 39, etc. This is probably 
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due to the fact that men (as well as women) do not tell their exact ages. A person who is 
actually 41 or 42 and known to be 40 or so, says he is 40. The recorded data show artificial 
bumps at every age which is a multiple of 5. Naturally the Census Bureau prefers the 
smoothed results to the observed. The student should not infer that the curve used to 
smooth these data is the normal type. The ‘‘life curve’' is a continuously decreasing func- 
tion* However, the same kind of quinquennial irregularity occurs in other actuarial data 
which do approximate the form of a normal curve. Many examples are given in Elderton, 
Frequency Curves and Correlation (Camb. Univ. Press, 4th ed., 1963). 

8.10 Normalizing an Ordered Series. Suppose a large class of students 
are given ratings A, B, C, D or F according to :v\ estimate of their mathe- 
matical ability based on class-work and home-work, and it is desired to give 
approximate scores based on the theory that mathematical ability is more or 
less normally distributed. This can be done by forming a relative cumulative 
frequency distribution and finding z-values which correspond to the dividing 
points between the classes. These z-values can be transformed into x-values 
in any convenient way by fixing two points on the scale. For example, let 
the frequencies in the various classes be as shown in the following table. 


Rating 

/ 


FJN 

z 

Xn 

F 

4 

4 

0.02 

-2.054 

29.6 

D 

36 

40 

.20 

-0.842 

60.0 

C 

92 

132 

.66 

0.412 

71.2 

B 

46 

178 

.89 

1.227 

86.0 

A 

22 

200 

1.00 

00 



200 


The relative cumulative frequencies F</iV correspond to ends of inter- 
vals, that is, to the dividing points between the classes. The corre- 
sponding values of z are found by interpolating in Table I of the Ap- 
pendix, reipembering that F</N ==f’ <^(2) dz = 0.5 + 4>{z) dZj foTZ> 0 

and F</iV = 0.5 —J* <t>{z) dz iov z < 0. Thus, for F</N = 0.20, the in- 
tegral is 0.30. From the table we see that the integral from 0 to 0.84 is 0.29955 
and from 0 to 0.85 it is 0.30234. By interpolation between these values we 
find that — z = 0.8416, so that z = —0.8416. The remaining z values are 
obtained similarly. If we now choose an x-score of 50 as the dividing line 
between C and D and an x-score of 85 as that between A and B, we fix the 
normal curve completely, since this curve has only two parameters, n and <r, 
which can be determined from two independent equations. The equations are 


(50 - fi)/<r « -0.842 
(85 - « 1.227 
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from which 2.069 <r =* 35; or <r = 16.9, and n *= 64.25. The a;-ecore corre- 
sponding to any z is now given by 

a: = 64,25 + 16.92, 

and the boundary scores are as given in the table above. The F scores are 
below 30, the D scores between 30 and 50, and so on. 

If desired, scores may similarly be obtained for the mediam of the respective 
classes. Thus, the median F-score will correspond to F< = 2, the median 
D-score to F< = 4 + 18 = 22, and so on. The scores so obtained are as 
follows: 


Rating 


FJN 

2 

X 

med~F 

2 

0.01 

-2.326 

24.9 

med~D 

22 

.11 

-1.227 

43.5 

med-C 

86 

.43 

-0.1764 

61.3 

med-B 

155 

.775 

0.7554 

77.0 

med-A 

189 

.945 

1.598 

91.3 


‘‘Normalized’' scores obtained in this way are not to be confused with the 
“standardized” scores of §7.7. There is no assumption of a normal distribu- 
tion with the latter type. 


Exercises 


1, Find by linear interpolation from Table I of the Appendix the values of ^(«) for 
(a) 2 = 2.174 and (6) 2 = -0.625. Ans. (a) 0.03755, (6) 0.3281 G. 


2 . Find the values oh4>(0 = J' 
Hint, ^ 4>{z) dz = 0.5. 


4>(z) dz for (a) < = 1.81 ; (6) < = -0.24; (c) r-«f.637. 

Ans. (o) 0.96485; (6) 0.40517; (|) 0.86013. 


8. Find the area under the standard normal curve: (a) between z »= 1.6 and z » 2.5; 
(6) between 2 = — 2 and 2 = 1.3. Ans. {a) 0.06060; (b) 0.88045. 

Hint. The area from 1 5 to 2.5 is the area from 0 to 2 5 minus the area from 0 to 1.6. 
The area from —2 to 1.3 is the area from —2 to 0 plus the area from 0 to 1.3 and the area 
from —2 to 0 is the same as the area from 0 to 2. 


4. Show from Table I that for a normal curve with mean tx and standard deviation 
the percentages of area outside the given ranges are as stated : 

Outside /X =*= <r = 31.74% 

Outside M =*= 2cr « 4.55% 

Outside M =*= 3o- *= 0.27% 

Hint, Convert these ranges into 2 units. 

6* Find the values of 2 , given the following values for 0 ( 2 ) : 

(o) 0.1267, (6) 0.0335, (c) 0.0034. Am. (a) *1.5146; (6) *2.2269; (c) *3.087. 
Hint. ♦(1.61) - 0.12768, ♦(1.52) * 0.12566. Interpolate. 
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6. Find ( euch that 

(o) f: V 2 ) dz = 0.2746, (6) j'* <t,(z) dz - 0.9973, (c) ^ <l,(z) dz - 0.7500. 

Ans. (a) 0.7541; (6) 3.00; (c) 0.6746. 

7. For a normal distribution, N = 1500, /a «= 75, a = 10. Find (a) the value of x such 
that = 800; (b) how many of the N values correspond to x <80. 

Ans. (a) 75.83; (6) 1037. 

/ • 800 

4>(z) dz = — • Find z and convert to x by the relation x « 76 -f lOz, 
“00 1500 

(6) Find z for x « 80, calculate ^(z) dz and multiply by 1500. 

8 . Find more exact values for tlie approximate percentages given in Fig. 30. 

9. For a certain normal distributicn the median is 89.0 and the first quartile 76.5. What 
is the standard deviation? Ans. 20.0. 

10. Given that for a normal distribution N - 1000, fx - 20, cr *= 3.5, find (a) the value 
of Qi] (5) the range of x corresponding to the middle 500 of the distribution; (c) the value 
of X corresponding to the 90th percentile. 

11, If for a normal distribution N — 300, /t = 75, a = 15, how many values lie between 
X * 60 and X = 70? 

12. In a college 8 grades are given, namely, A, A—, B, B — , C, C~*, D, and F. On the 
asaumptions that ability in a large class is approximately normally distributed, that the 
mean of the distribution lies at the boundary between B— and C, and that each grade inter- 
val corresponds to 0.8<r, how many out of a total of 1000 should there be in each grade? 

Ans. 8, 47, 157, 2S8, 288, 157, 47, 8. 

13. It is desired to normalize scores in a class of 16 students, the order of the students 
having been settled by their scores on a test. Obtain the normalized scores, supposing that 
the lowest score is to be 30 and the highest 95. 

Hint. A relative cumulative fiequency table is formed as in §3.7, the studente in order 
from lovrest to highest being assigned cumulative frequencies of 0.5, 1.5, 2.5, • • * , 15.5, 
which are then divided by 16. Corresponding values of z are obtained from Table I. 

14. (Yule and Kendall). A collection of human skulls is divided into three classes accord- 
ing to the value of a “length-bieadth index” x. Skulls with x < 75 are classed as dolicho- 
cephalic (long-headed), those with 75 < x < 80 as mesocephalic (medium), and those with 
X > 80 as brachycephalic (short -headed). The percentages in the three classes in this col- 
lection are 68, 38, and 4. Find approximately the mean and standard deviation of x, on the 
assumption that x is normally distributed. Ans. m = 74.4, cr = 3.23. 

16. Values of the skewness (gi) and kurtosis (g^) vere worked out in §7.13 for the popu- 
lation of spans of adult males. Are these values reasonably near to zero, according to the 
criterion discussed in §8.8? Write down the equation of the best-fitting normal curve for 
this population, 

16. Graduate by means of a normal curve the distribution of lengths of telephone calls, 
Table 25, p. 87. (Take ^ == 477.3 sec, <7 = 1 ,5.7 sec.) 

17. A distribution of weekly wages for 906 miners at a certain date showed the following 
results: 5 = $36.13, s, = $8.87, as = 0.007, ^4 = 3.02. Assuming that the true distributioh 
is approximately normal with the mean and standard deviation estimated from the sample, 
calculate the proportion of miners who received weekly wages (a) in excess of $65, and 
(b) less than $25. 

18. An urban electric railway company operating a large subway uses thousands of elec- 
tric light lamps in its underground stations. On January 1, 1964, the company put into 
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service 5000 new lamps. Assume that the distribution of length of life for these lamps is 
normal, with a mean of 60 days and a standard deviation of 19 days. If January 1 is counted 
as a full day, how many lamps out of the 5000 now ones would need to be replaced by mid- 
night January 31, 1054? How many by March 10, 1964? 

19 . (Camp). The standard deviation of a set of 100,000 high school grades was 11%, 
and the mean grade was 78%, Assuming that the distribution was normal, find (a) how 
many grades were above 90%, (b) how many were below 70%, (c) what was the 99th 
percentile, (d) what was the scmi-interquartile range. 

80 . In a certain normal distribution we have N ~ 1000, m « 50, a - 10. For this dis- 
tribution (a) convert the values a: = 20, 30, 40, 50, 00, 70, 80 into the corresponding 
(fe) find the corresponding values of ; (c) convert these into y values; (d) plot the points 
(Xf y) and sketch a smooth curve joining them; (e) calculate the partial frequencies between 
20 and 30, between 40 and 50, and between 42 and 74; (f) find the values of x for which 
jP< «= 250, 6(K), 750, respectively. 

21 . Suppose a variate v is normally distributed with mean 0 and variance 25. 

(a) Give the equation of the frequency curve for a population of size N, 

(h) If there are 793 values between v — and v *= 0, find A. 

(c) Find what percentage of values correspond to a > 10. 

(d) Find the value of v for which F</N — 0 75. 

22. 100 individuals are graded in 5 classes, ranging from A (the highejst) to E (the lowest), 
the frequency distribution being as follows: 


Grade 

! ^ 

B 

c: 

I) 

E 

J 

1 ^ 

21 

39 

28 

7 


If the grades are normalized and scores are given so that 90 is the median score for class A 
and 10 the median score for class E, find what scores correspond to the points of division 
between the classes. 


References 

1 . H. M. Walker, “Bicentenary of the Normal Curve," J. Amer. Stat. Assoc.y 29 | 1934, 
pp. 72-75. 

2 . F. C. Martin and D. H. Leavens, “A New Grid for Fitting a Normal Probability 
Curve to a Given Frequency Distribution," J, Amer, Stat. Assoc. ^ 26, 1931, pp. 178-183. 

8. H. C. Carver, “The Concept and Uiility of LVequency Distributions," J. Amer. Stat. 
Assoc. y 26 , 1931 (Supplement), pp. 33-36. Discussion on above, by B. H. Camp, p. 36. 



CHAPTER IX 

PROBABILITY 

9.1 Meaning of Probability. The notion of probability has been intro- 
duced several times in previous chapters, but uithout any very precise defini- 
tion. Indeed, it is extremely difficult to give a precise definition at an ele- 
mentary level. The notion is so important, however, and will recur so fre- 
quently in later chapters, that we must attempt some further explanation, 
and indicate how simple calculations connected with probabilities can be made. 

From the statistical point of view, the notion of probability is a rather 
natural extension of the notion of relative frequency. In 100 throws with a 
die, the 6-spot turns up, say, 15 times. We try again and this time we throw 
17 sixes. If we keep on for thirty or forty sets of 100 throws each, we shall 
obtain a set of relative frequencies which will cluster around the values 
0.16 and 0.17, and it is likely that the mean of these relative frequencies will be 
quite close to 1/6. It is a matter of common experience that when we have 
a well-defined process which can be observed over and over again (like throw- 
ing a die) the relative frequencies of some definite characteristic associated 
with the process (like the number 6) always show^ this tendency to cluster 
around a fixed value, and they do so more closely the greater the number of 
observations. This fixed value is called the probability of the characteristic, 
and it is obviously a number between 0 and 1 inclusive. This number can 
be thought of as the ideal or theoretical frequency of an event, the event being 
the appearance of the characteristic in question. 

Although the theory of probability started from discussions erf games of 
chance, and although these games still furnish useful illustrations, probability 
has wide applications in many different fields. The ^^event^' may, for example, 
be the appearance of a defect in an article turned out in large numbers by a 
certain factory machine, or it may be the occurrence of color-blindness in an 
adult male, or the possession of a taxable income of more than $6000. There 
is in each case a population, finite or infinite, relative to which the probability 
is defined. Thus, samples of 100 machined articles may be examined daily 
for defects, and, as long as the machine is properly adjusted, the relative 
frequencies will lie near a number which is the probability that this machine 
will turn out a defective article. Of course, something may go wrong with 
the adjustments, and the machine may suddenly start turning out large 
numbers of defectives, but then we say that the process is “out of control.^' 
The basic conditions have changed, and we are no longer dealing with the 
same population. The population of adult males in, say, Canada, is not 
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mfinite, but it is a largo number, and there is a probability that if one were 
selected at random and examined he would turn out to be color-blind. This 
probability can be assessed from the known relative frequency of color- 
blindness among groups which have been tested, such as recruits for the Air 
Force. The relation of probabilities to relative frequencies is something like 
the relation of the ideal points and lines of geometry to the actual chalk or 
pencil marks made on the blackboard or on paper. These marks are a crude 
approximation to the ideal indefinitely small points and indefinitely thin 
lines about which we reason. In mechanics, again, we discuss the properties 
of '‘particles’* and "rigid bodies,” which are nonexistent abstractions, but the 
practical value of theoretical mechanics lies in the fact that many observable 
objects in the real world behave very much like these abstractions. How 
far the calculus of probabilities applies to the real world can be determined 
only by observation and experiment. 

9.2 Combination of Relative Frequencies. Probabilities are assumed to 
obey certain laws of combination suggested by the corresponding laws for 
relative frequencies. To illustrate these laws, let us consider a simple two-way 
frequency distribution, known as a two-by-tw’^o (2 X 2) table, such as Table 
32, which gives data on the incidence of a certain disease among a group of 


Table 32. 2X2 Table of Effect of Inoculation 



Inoculated 

Not-inoculaied 

Total 

Attacked 

2 

10 

12 

Not-attacked 

5 

3 

8 

Total 

7 

13 

20 


20 people, some of whom had been inoculated with a drug and others not. 
The frequencies in the vertical margin, 12 and 8, form a marginal frequency 
distribution of attack; those in the horizontal margin, 7 and 13, form a marginal 
frequency distribution of inoculation. In both cases there are two classes 
in the distribution, and the marginal frequencies are given by adding the 
individual frequencies, either along the row^s or up the columns of the table. 

The first column by itself gives a distribution of incidence of attack among 
the inoculated; this is called a conditional distribution. Similarly the second 
column gives a conditional distribution for the not-inoculated, and the two 
rows give conditional distributions of inoculation among the attacked and 
among the not-attacked. 

The conditional relative frequency of attack among the inoculated is 2/7, 
and the relative frequency of inoculation in the sample is 7/20, The relative 
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frequency of individuals in the sample who are both inoculated and attacked 
is 2/20, which is the product of 2/7 and 7/20. Again, if we want the total 
frequency of individuals who have either been inoculated or not attacked by 
the disease we can compute it by adding the marginal frequencies for inocula- 
tion and non-attack and subtracting the frequency of individuals coming under 
both categories, namely, 7 + 8 — 5 = 10. This is the same as the total 
sample frequency less the number who are both attacked and not inoculated, 
that is, 20 — 10. 

These results can be generalized as follows: 

Let A and B be two events and A. and 5 the complementary events. That is, 
A (read “A-tilde'O denotes the event “A does not occur.'^ For example, if 
A means attacked, A means not-attacked, and every individual in the sample 
may be placed in one of these two classes. Similarly, every individual is 
either a fi or a B (namely, a B or a not-B). The generalized 2X2 frequency 
table is shown as Table 33, where fn means the frequency of simultaneous 


Table 33. Genebalized 2X2 Table 



A 

A 

Total 

B 

fn 

hi 

Ti 

s 

hi 

fn 

Tt 

Total 

C\ 

Ci 

N 


occurrence of both A and B, etc. The marginal frequencies for A and A 
are the column totals ci and C 2 ; the marginal frequencies for B and B are the 
row totals n and r 2 ; and N is the overall total frequency. Clearly, iV « Ci + 
Cj « fx + ^ 2 . We now use a notation borrowed from the subject of Symbolic 
Logic, in order to specify certain compound events. By AB we shall mean 
the simultaneou*^ occurrence of both events A and B. By A + B we mean 
the occurrence of either A or B or both, which may be expressed as ‘‘A and/or 
From this definition it follows that 

(9.1) a + b^aB + 1b + ab 

(The reason for using mathematical notation in this sense will appear shortly.) 
Writing the corresponding relative frequencies as f{A\,f{AB\, etc., we see 
from Table 33 that 

f{A]^c/N, f[I]^c/N 
f{AB\ -/u/W 

.f{A + B\ « (fu+fn+fi»)/N 


(9.2) 
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dec. 9.3 Rules for Combining Probabilities 

It follows immediately that 

(9.3) f{A] +f{I\ =.l 

Since /n + /21 + /12 AT — / 22 , 

(9.4) f{A+B] 

Also, fii + /21 « n, and /u + /12 = Ci, so that fu + /21 + fn = ri + Ci - fn. 
Hence 

(9.5) flA + B] =/{A} +f{B} ^f{AB\ 

The observations in the first column of Table 33 form a conditional fre- 
quency distribution for B among those individuals, Ci in number, for which 
A is known to occur. The event *'B occurs when it is given that A also occurs” 
is denoted by B|A,(read given A”) and from Table 33 we have 

(9.6) f{B\A} =/u/ci 

8ince/{A^) = fu/N and f{A} — Ci/Nj we see that 

(9.7) f[AB} ^f{A}f{B\A} 

and similarly 

(9.8) f{AB} ^f{B}f{A\B} 

Relations (9.5), (9.7), and (9.8) form the basis of the axioms of probability 

given in the next section. 

9.3 Rules for Combining Probabilities. Since we have defined a prob- 
ability as a theoretical or idealized relative frequency, it is natural to assume 
that probabilities will obey the same rules of combination as relative fre- 
quencies. We therefore take it as axiomatic that 

(1) The probability of an event A, denoted by P{i4 } , is a number between. 
0 and 1 inclusive, an impossible event having probability 0 and one that is 
certain to occur having probability 1. 

(2) The probability that at least one of the two events A and B occurs is 
given by 

(9.9) P{A + = F[A } -f P{B} - P[AB] 

(3) The probability of the simultaneous occun-ence of both A and B is 
given by 

(9.10) P{AB] = PIA] P{B\A] - P{B\ P{A\B} 

If the two events A and B are mutually exclusive (meaning that it is impos- 
sible for both to occur together), P{AB] = 0, and (9,9) becomes 

(9.11) P{A+B\ ^P{A]+P{B] 
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This is called the addition theorem for probabilities and is the reason for Ihe 
use of the + sign to denote “and/or” which, when A and B are mutually 
exclusive, reduces to the strict alternative “either-or.” 

Since the events A and A are mutually exclusive, and one of these must 
occur, we have 

(9.12) P{^}+P{I|=1 

The events A and B are said to be independent if the probabihty of occur- 
rence of .A is the same whether B occurs or not, that is, if 

P{A\B] ^P[A\B] 

Now P[A] ^ P\AB\ + P{AS} 

= P{P} P{^|P} 4- P{E] Pf^lSj 
by (9.10). If P{^1£} = P{A\B\, this becomes 

P{A} ^P{A\B}iP{B\ +P{S}) 

- P{A\B] 

and therefore (9.10) can be written, for independent events, . 

(9.13) P{AB\ ^P{A}P{B\ 

This is called the multiplication theorem for probabilities, and is the reason 
for using AB to mean ^^A and 

These theorems may be generalized to any number of events. Thus, if the 
events At, Az^- * •, are mutually exclusive and if one of them must occur, 

(9.14) P{Ai\ + P{A2] + * • * + PlAn.] « 

P{Ax'A' A%A- + -dm) ~ 1 

Let us now suppose that the probability is the same for each of these m 
events, say P. Then mP = 1, so that P = 1/m. 

If we designate the first k of the m events as ‘favorable, the probability 
of a favorable event is 

(9.16) P{ili + • • • + d*} = P{Ai} + P{A^] + • • • + P{d*j 

*= k/m 

This result was used as the definition of probability by Laplace and the early 
writers on the subject. An example is the throwing of a die, discussed in 
§9.1. Here there are six mutually exclusive possibilities and one of them, 
the turning-up of the fi-spot, is regarded as favorable, so that fc * 1 and m *» 6. 
If we consider that any one of the six faces of the die is precisely as likely to 
turn up as any other, the probability of throwing 6 is This definition 
of probability is convenient in some situations, particularly in connection 



See« 9*4 


Pexmutattons 


129 


witb games of chance, but its applicability is very limited, and in most sta^ 
tistical problems the frequency definition makes much more sense. For a 
fmther discussion see Chapter I of Part Two. 

9.4 Pennutations. For a considerable number of simple calculations in 
probability, based on equation (9.15), it is necessary to make use of the 
elementary algebra of permutations and combinations, and we shall now de- 
velop some of this theory. 

A permutation of a finite number of distinguishable objects is any arrange- 
ment of these in a definite order. For example, the objects may be thought of , 
as numbered wooden blocks. With two blocks there are two possible arrange- 
ments 1 2 and 2 1. With three blocks there are six arrangements, namely 
12 3, 1 3 2, 213, 2 31, 3 12, and 3 2 1. The student can easily verify, 
by writing them out, that there are 24 different arrangements of the munbers 
1, 2, 3, 4. We can get a general formula with n blocks, by considering that 
the first place in the ordered arrangement can be filled in n ways, since any 
one of the n blocks can be chosen. The next place can be filled in w — 1 
ways, since only n — 1 blocks are left to be picked. Similarly the third place 
can be filled in r? — 2 ways, and so on, until the last place can be filled in only 
one way, with the last block left. The total number of ways is, therefore, 

n(n~ l)(n - 2)---3'2-l 

and this number is denoted by n! (read “n factoriakO- n ~ 4, nl = 24, 
for n = 5, n! = 120, and so on. 

This principle of filling places one at a time is often useful in working 
problems. 

Example 1. There are 6 seats available in a car. In how many ways can 6 persons be 
seated for a journey, if only 3 of them can drive? 

Here the first place may be taken as the driver's seat, which can be filled in 3 ways. For 
the next seat there are 5 persons available, for the next 4, and so on. The total number of 
ways is therefore 3*5'4' 3 * 2*1= 360. 

Theorem 1. The number of ways of selecting r objects out of n distinguishable 
objects, and arranging them in order, is 

(9.16) = n(n ~ l)-**(n — r + 1) = n\/(n — r)I 

This is an application of the same principle. The first place can be filled 
in n ways, the second in n — 1, and the rth in (n — r + 1) ways, so that 
P(n,r) = n(n — 1)* • • (n — r + !)• If we multiply P(n,r) by (n — r)! we 
get n(n — 1) * • • (n — r + 1) (n — r) (n — r — 1) * * • 1, which is n!, thus veri- 
fying the second form given in (9.16). P(n,r) is usually called the number of 
permutations of n things, r at a time. Observe that P(n,r) denotes an integer. 

Hieorem 2. The number of ways of filling r places with objects selected out 
of n distinguishable objects, when the same object can be used as often as deh 
sired, is n^ 
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This follows at once from the fact that each place can be filled in n 'Wa 3 rB, 
regardless of the way the earlier places have been filled. 

Example 2. The number of S-digit license plate numbers that can be formed from the 
ten digits 0, 1, • * * , 9 is 10* = 100,000. If two letters of the alphabet followed by tliree 
digits are used, the number is 26® 10* = 676,000. 

Theorem 3, If p out of n objects are indistinguishable from each other, the 
number of permutations is n\/p\. 

Proof: The total number of permutations is ??!, but for every arrangement 
"of the n — p different objects there are p\ permutations which are identical, 
because they differ among themselves only by rearrangement of the p indis- 
tinguishable objects. The number of different pf;rmutations is therefore n !/ p I 

CoroUaiy. The number of permutations of n objects, of which n\ are alike of 
one kind, n^ alike of another kind, and so on, is nl/(ni\n2\* • -Ukl), where 
tTi + ^2 + * * • + W'lfc = n. 

Example 3. The number of arrangements of the letters of the word “independent,’' 
taken all together, is 111/(3! 3! 2!) == 554,400, since there are 11 letters including 3 e‘s, 
3 n’s and 2 d*8. 


9.6 Combinations. If we want to know in how many ways we can pick 
out a numirer of objects from a collection, not caring in what order they are 
arranged, we have a problem in combinations. Such problems are much more 
important in probability theory than those in permutations, because we are 
seldom interested in arrangements as such. The number of ways of picking 
out r objects from n distinguishable objects, called the number of combinations 

of n things r at a time, will be denoted by C(n,r) or by which may be read 

above r.’’ The latter notation is now common, and other notations such 
as "Cr are also used. 


Theorem 4. 

(9.17) 


C(n,r) = 


n! 

r!(n — r)! 


By permuting each combination of r things among themselves we shall 
obtain all possible permutations of n things, r at a time. Each combination 
gives rise to r! permutations, so that r\C{n,r) = P{n,r) = n!/(n — r)!, 
whence the theorem follows. 


Corollary. 

(9.18) C(n, r) = C(n, n — r) 

This follows from (9.17) by writing n — r instead of r. It is also obvious, 
since, if we pick r things out of n, we pick at the same time the n — r things 
which are left behind. 
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Theorem 6. 

(9.19) C(n, n) - 1 

There is clearly only one way of picking all the n objects, so that C(n, n) 
must be 1. If equation (9.17) is still to apply when r — n, wc must interpret 
the symbol 0! as 1, and this is the convention that is followed. Similarly wo 
must intopret C(?^, r), for r > n, as being zero. 

Example 4. In how many ways can a committee of 3 be chosen from 5 married couples, 
if a husband and wife cannot both sit on the committee? 

We can pick three couples out of the five in C(5, 3) = 51/(3! 2!) ~ 10 ways. A member 
can be chosen from each couple in C(2, 1) « 2 ways. The total number of ways is therefore 
10 • 2 • 2 • 2 = 80, 

Theorem 6. If n objects consist of ni all of one kind^ rh all of another kind^ 
and so on, up* to Uk of the fcth kindy the total number of selections that can be made 
of ly 2y Z up to n objects is 

(9.20) (ni + l)(n2 + !)• • • (n, + 1) ~ 1 

We may take either none or 1 or 2 or up to ni of the first kind, giving ni + 1 
possibilities. Similarly we may take none or 1 or 2 or up to n 2 of the second 
kind, giving ^2 + 1 possibilities, and so on. But we exclude the case when 
we select none of any kind. 

Corollary. The total number of selections from n objects all different is 2^ — 1. 

This is found from (9.20) by putting ni == n 2 = • • * = n* = 1, and noting 
that k ^ n. 

Example 5. A traveler has in his pocket a nickel, a dime, a quarter, and a half-<lollar. 
In how many ways can he give the porter a tip? Am. 2^ — 1 = 15. 

Theorem 7. The number of ways of putting m indistinguishable objects into 
n numbered compartmentSy if any number {including 0) may go into any com- 
partmenty is 

C{n + m — ly m) — (n + m — 1)\/ [m\{n — 1)!] 

Proof: We think of the m objects as arranged in a line, thus: 

0000000 0 - -0 

Now we imagine n — 1 vertical lines placed anyivhere between these objects, 
thus: 

0|0 olio 0 0101---10 

These partitions will separate n compartments, which will contain none or 
1 or 2 or up to m of the objects, and these compartments may be supposed 


* '‘Up to n** includes n. 
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numbra«<l from left to ri^t. The number of possible arrangements will be 
the number of permutations of m + n — 1 objects, of which m are alike of 
one kind and n — 1 alike of another kind (namely, the partitions). By the 
corollary to Theorem 3, this is (w + n — 1)!/ [m!(n — 1)!]. 

Example 6. There are 7 horses in a race. If a man has 5 dollars to bet with, and can bet 
only in multiples of a dollar, in how many ways can he bet on one or more horses to win? 

Here we have 5 objects (dollars) to put into 7 numbered compartments (ticket windows). 
The number of ways is C(ll, 6) ~ 462. 


9.6 Some Problems in Probability. We now give a few examples of prob- 
lems which can be solved by enumerating the possible cases and the favorable 
cases, and using equation (9.15). 


Example 7. What is the probability of holding 4 aces in a hand at bridge? 

The number of possible hands of 13 cards is C(52, 13). The number of hands containing 
4 aces is equal to the number of ways that the remaining 9 cards ciin be picked out of the 
48 cards in the deck which are not aces. This number is C(48, 9) . The required probability 
is, therefore, 


C(48, 9) 
C(62, 13) 


48! 39! 13! _ 13 • 12 • 11 ‘ 10 
9! 39! 52! 52 • 61 * 50 • 49 


11/4165 


or about 1 in 379. 

The fundamental assumption here is that every completely speafied hand is as likely as 
any other one, dealt from a well-shuffled deck. The hand consisting of all 13 spades is just 
as likely as the hand consisting of clubs, K, 10, 3, diamonds A, J, 5, 4, hearts Q, 9, 8, and 
spades 10, 7, 4. There are, however, a very large number of different specified hands, about 
635 billion in fact, and only 4 of these are hands consisting of a complete suit, so that the 
probability of such a hand is extremely small. In actual play the shuffling is seldom very 
thorough and it may be doubted whether the fundamental assumption is justified. 

Example 8. Find the probability of a hand of 13 cards containmg 3 clubs, 4 diamonds, 
3 hearts, and 3 spades. 

The number of ways of picking 3 clubs out of the 13 clubs in the deck is C(13, 3). Each 
of these ways may be associated with any of the C(13, 4) ways of picking ihe diamonds, the 
C(13, 3) ways of picking the hearts, and the C(13, 3) ways of picking the spades. The total 
number of favorable ways is, therefore, the product of these numbers, and since the total 
number of ways is (7(52, 13), the required probabihty is 

(13!)s 131 13139! 

[C(13. 3)]3C(13.4 )/C(62, 13) = ° 


9.7 Simple and Compound Events. A simple event is one which can be 
represented by some value or set of values of a single variate x. In some 
cases X is discrete, that is, it can take only isolated values such as 0, 1, 2, 3* • • , 
In other cases it may range over a finite or infinite interval of the x-axis. The 
throwing of a die and the observation of the number of spots on the upper 
face constitute a simple event, where x can take only the values 1, 2, 3, • • 6. 

Another simple “event” would be an adult male having a height of over 6 ft, 
and X would then be any number greater than 6. The event in Example 7, 
that of having a certain number of aces in a hand of bridge, can be repre- 
sented by X « 0, 1, 2, 3, or 4. 
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For a compound event we require two or more variates. Thus, if two dice 
are thrown together, any number of spots from 1 to 6 on the first die may be 
associated with any number from 1 to 6 in the second die. There are there- 
fore 36 possible combinations, which can be represented by points (x, y) in 
a plane, where x and y both take the values 1, 2, • • • , 6. These points form a 
square arrangement of isolated dots. By writing out all the possible combina- 


( 1 ) ( 2 ) 
1 1 
1 2 
1 3 

1 4 

1 5 

1 6 


( 1 ) ( 2 ) 
2 1 
2 2 
2 3 

2 4 

2 5 

2 6 


( 1 ) ( 2 ) 

3 1 

3 2 

3 3 

3 4 

3 5 

3 6 


( 1 ) ( 2 ) 

4 1 

4 2 

4 3 

4 4 

4 5 

4 6 


( 1 ) ( 2 ) 

5 1 

6 2 

6 3 

5 4 

5 5 

5 6 


( 1 ) ( 2 ) 
6 1 
6 2 
6 3 

6 4 

6 6 
6 6 


tions in a table, we can see at a glance how many correspond to a given 
total. Thus th(^^e is only one combination, 6 6, which will give a total of 
12 and the probability of this total is if we assume that the dice are 
uniform and well-balanced so that all faces are equally likely to appear. A 
total of 7 can, however, be made up in 6 different w^ays: 1 6, 2 5, 3 4, 
4 3, 5 2, and 6 1, so that the probability of a total of 7 is i. One can 
check easily that the probabilities of the various possible totals are as given 
in the following table: 


Total 

234 5 6789 

10 11 12 

Prob. 

1 2 3 4 5 6 5 4 

3 2 1 

(X36) 



The probabilities are multiplied by 36 to avoid fractions. The sum of the 
probabilities is 1, as it should be. 


Example 9. What is the probability of throwing either 7 or 1 1 with two dice? 

These events are mutually exclusive and the probability is the sum of the individual 
probabilities. The probabihty of 7 is ^ and that of 11 (from the above table) is -j^. 
The sum is f . 

Example 10. Show that the probability of throwing 6 at least once in 4 throws of a die is 
a little more than 0.5, but that the probability of throwing double-6 at least once in 24 
throws with two dice is a little less than 0.5. 

The probability that an event happens at least once is the complement of the probability 
that it does not happen at all. If we calculate the latter, we have only to subtract it from 1 
to get the former. The probability of not getting 6 in a single throw is f-. Since the 
successive throws of the die are supposed to be independent, the probability of not-6 in four 
throws is Therefore the probability of at least one six is 

1 - (|)4 = 671/1296 = 0.516 

Similarly, the probability of not getting double-6 in a single throw with two dice is 
The probability of not getting it in 24 throws is (|-|)“ and the probability of at least one 
double-6 is 


1 - (M )•* • 0.491 
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This problem was one of the earliest to be solved in the history of the theory of probability. 
The Chevalier de Mer^, a gambler at the French court in the middle of the seventeenth 
century, had noticed that, w Idle it paid to bet on the first event, it did not pay to bet on the 
second. This seemed ti> him unreasonable, and he consulted the mathematician Pascal 
{1623-1662) who worked out tlie probabilities. 


9.8 Continuous Probability. The problems so far considered have lieen 
examples of discrete variates, when the number of possible values was finite. 
There are other problems, however, wheie x is a continuous variate, so that 
no enumeration of favorable and total cases is possible. If we consider first 
a simple event, tliere will be, in general, a probability tluit x will lie in a speci- 
fied interval out of its whole domain, and this probability will l>e a function 
of X. There is, for example, a probability that an adult Canadian male, 
selected at random, will have a height less than or equal to 6 ft. The func- 
tion F{x) which expresses the probability that a certain variate has a value 
equal to or less than x is called the distribution function ol the variate. If the 
variate follows the normal law, for example, the graph of the distribution 
function is an ogive like that in Fig 32 of §8.8. For distributions expressed 
by smooth continuous (uirves, there will also be a probability that x lies in 
an infinitesimal interval of the domain, denoted in the language oi calculus 
by dx. Since this jirobability will be proportional to the size of dx, we denote 
it by/(x) dx, and we (!all/(x) the probabikty density, or the prohah'ility Junc- 
tion^ of the distribution. If the whole domain of x stretches from k to k 



1 


and if we regard as a ^'favorable'’ event one which corresponds to a value of 
X between certain limits ki and the probability of such an event is given by 



dx 


In many problems it is reasonable to consider that every value of x within 
its domain is equally likely. If so, / (x) is constant, and wo can easily evaluate 
the probability. If/(x) = C, 


so that C 
Then 


1/(^2 - h ). 



dx - C{lt - h) 



= C{k^ ~ h) - h)/{h - h) 


Thus if a point is selected at random on a line 6 inches long, the probability 
that it lies within an inch either way of the middle point is | since the 
favorable interval is 2 inches long and the whole domain 6 inches long. 
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This principle may be extended to compound events. If, for example, a 
favorable event corresponds to the position of a point (x, y) within a certain 
region R of the x-y plane, and if the whole domain of x and j/ is a region D, 
then, on the assmnption that all positions of the point are equally likely, the 
probability of the favorable event is 
RID. 

Example 11. A horizontal flat plate 8 in. 
square is ruled with a grid of fine lines, 2 in. 
apart, and has a vertical rim all around the edge. 

If a penny (diameter in ) is tossed on to 
the plate, what is the probability that it rests 
without crossing a line? 

Because of the rim the center of the penny 
cannot he within | in. of the edge of the plate. 

Hence, the effective domain D is a square of 
side 7\ in. If the penny is not to cross a line of 
the grid, its center cannot he within | in of 
any such line. The favorable region, therefore, 
consists of 16 squares each of side 1^ in. (the 
area shaded in Fig. 34), so that * 16 X 


Then the probability required is 2b / ~ = 0.476, assuming that the center 

of the jienny is ecjually likely to fall anywhere WTthin D. 

As another example of a continuous probability distribution, consider a well-balanced, 
smoothly-pivoted horizontal w^heel, carrying a mark on its edge and rotating above a fixed 
circular scale (Fig 35). If the wdieel is spun and allowed to come to rest, the pointer may 
be supposed equally likely to indicate any readihg on the scale from 0“ through 180® to 360°. 
The probability that it stops somewhere betw^een, say, 0® and 90° will then be -J-. The 
distnbuUon function F{x) is as show n in Fig. 36, wdiere, as usual, F{x'^ means the probability 
of a value less than x, that is, in this case, of a value between 0 and x. 

F(x) 

1.0 


0.75 


0.5 


0.25 

0 




j 

j 

PH 


r--i 

ri 

11 







1 i-wV.-wU 

li 




Fia. 34 


Fia. 35 


Fia 36 
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The probability density /(a?) is constant and equal to Tbat is to say, the probabO- 
ity that the pointer stops in an interval of the scale is dx/360. The probability is sero 
that the pointer stops precisely at some definite point of the scale, because a point is an 
interval of zero length. However, no point of the scale is an impossible value, because the 
pointer must stop somewhere. A zero probability for an event does not therefore always 
mean that the event is impossible. 

The normal curve is an example of a continuous probability distribution in which the 
probability density is not constant. The probability of a value between x and x -j- ete is 
(2r)-^ 0--1 «-(»-'*)’/>»* cfcc, and, as we have seen, this probability is a maximum at x « m and 
falls off to zero symmetrically on both sides. The probability of a value between x = a and 
X « 6 is the area under the curve between these limits, and is found from the table of areas 
for the standard normal curve. 


9.9 Moments of a Probability Distribution. We can define moments for 
a probability distribution in the same way as for a population, with the imder- 
standing that relative frequencies are replaced by probabilities. Thus, the 
mean of the distribution is given by 

(9.21) = i: xJix,) 

t-1 


if the distribution is discrete, and by 


(9.22) 


n = f xf{x) 


dx 


if the distribution is continuous. Similarly the variance is given by 

k 

(9.23) 
or by 

(9.24) 


t-1 

= f (x - fiYfix) dx 

t/ u 


according as the distribution is discrete or continuous. Higher moments can 
be obtained, if desired, in the same way. 


Example 12. Calculate tlie mean and variance for the probability distribution of the 
number of spots with two dice (§9.7). 

Here x can take integral values from 2 to 12 and the corresponding probabilities are given 
in the table preceding Example 9. 

By adding the products of x and/(x) we find from (9.21) that n — 7. (This is also obvi- 
ous from the symmetry of the distribution.) From (9.23), 

0* =* Z^x,*/(x,) — 2/i]^ajJ‘(x,) -j- 

- - M* 

since 

X/(x<) » 1 and m 

We find from the table that 

23i,y(i,) - (4 • 1 + 9 • 2 + 16 • 3 + 26 . 4 + . • • + 144 • 1] 

36 

= 1974/36 = 329/6. 

Therefore c* - 329/6 - 49 - 35/6, so that <r = 2.416. 
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Bwemtple 13. Calculate the mean and variance for the probability distributioii ol the 
pointer reading in Fig. 36. 

By (9.22), with/(x) «= 1/360, 


1 ^360 1 

' I X ox =» — 

360 Jo 360 


(360)* 


180 


By (9.24), 

J ’360 t /QAA)* /a \ 

^ 80).. 10800 

so that tr *a 103.9. 


9.10 Mathematical Expectation. Let Aiy ^^ 2 , • * *, An be mutually 
exclusive events, of which one must happen, and let their probabilities of 
occurrence be pi, pa, • * • , Pn- Suppose you will receive a sum of money $Xk, 
if event A* happens. Then we say that your mathematical expectation of 
gain is 

(9.25) E(x) = 

;k-i 

For example, if you buy a ticket in a lottery in which there is one prize of 
$1000 and ten prizes of $50, and if 10,000 tickets are sold, your mathematical 
expectation is 


1 

10,000 


( 1000 ) + 


10 

10,000 


(50) + 


9989 

10,000 


(0) = $0.15 


since you have a probability 1/10,000 of winning the first prize, a probability 
10/10,000 of winning one of the other prizes, and a probability 9989/10,000 
of winning nothing. Mathematical expectation is therefore different from 
expectation in the ordinary sense of the term, since you do not really ‘'expect'^ 
to get 15 cents. This sum is, however, the fair price which yoU should pay 
for a ticket, in the sense that if you continued indefinitely buying tickets in 
similar lotteries, millions of them, your average net gain per ticket would be 
zero. When a man pays a premium for a term insurance policy, he is in effect 
playing a similar game. His beneficiary will receive the stipulated sum if he 
dies, and nothing if he lives. The probabilities are assessed from mortality 
tables, and the premium is a fair price (apart from the rather high ‘‘loading'' 
for administration expenses, commissions, etc.). 

The term “mathematical expectation", or simply expectation^ is used in 
a wider sense. If /(x») is the probability that a variate x takes the value 
Xi(i ^ If 2, •••, k) the expectation of a: is defined as 

(9.26) E(x) = 'txifixi) 

Similarly if * is continuous, with probability density /(*), h < x <lt, 

(9.27) £(x) - dx. 

Jii 
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The expectation is thus the same as the mean ft for a probability distribution. 
Sometimes is called the expected value of x. 

Example 14. A variate z has the probar 
bility density fix) « 0 for ai < 0, f(z) «* 
|(x - 2)»for 0 <x < 2, fix) « Oiorx > 2. 
The graph of /(x) is shown in Fig. 37. 

Find the expected value of x and its stand* 
ard deviation 

E{x) =J\ 



xf(x)dx 
= - f\x> - 4a:‘ + 4*) dx 

“ 8L4 “ T"*" 2 J. 


The expectation of x® is similarly defined as 


E(x^) 


xY(x) dx 


■X 

■lf>- 
■Kf— 


4x* 4- 4x*) dx 
32 '' 


so 


The variance of x «= Eix^) — 

? _ i - A 

“ 6 4 *“ 20 

that the standard deviation of x is (0.15) ^ = 0.39. 


0.15 


9.11 Statistics and Probability. The theory of probability, as outlined 
in this chapter, forms the basis for statistical inference which will occupy us 
in some later chapters. The laws of probability are exact, theoretical laws: 
the extent to which they apply to events in the real world can be decided only 
by experience. 

The various kinds of empirical data which form the subject matter of 
statistics have one element in common, namely, an element of randomness or 
unpredictability. Although we may feel sure that the tossing of a coin is a 
mechanical process governed by known physical laws, yet the result (head or 
tail) is in practice quite unpredictable, at least with an ordinary coin spun in 
the ordinary way. The final state is so dependent on minute changes in the 
initial position and angular velocity of the coin that we cannot possibly calcu- 
late what it will be. Similarly, the yield of com in a given year from a given 
plot of land is dependent on a multitude of variable factors affecting the 
climate, quality of seed, etc., so that the exact value of the yield cannot be 
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known beforehand. The fluctuations from year to year in the date of Easter 
might seem to a person ignorant of C'hurch history an example of a random 
process, but there exists, of course, an exact mathematical formula from 
which the date in any year can be predicted. The date of Easter is not, 
therefore, a random variable of the kind contcTuplated in statistics. 

The bridge between the theory of probability and the behavior of statistical 
data is provided by the empirical fact that in a long series of random experi- 
ments or observations the relative frequency of a particular result shows a 
marked tendency to settle down to a constant value. In spite of the irregular- 
ities of individual tosses of a coin, and even occasional long runs of heads or 
tails, the proportion of heads to the total number of tosses always approxi- 
mates, as this number increases, to a fixed value which is taken as the probabil- 
ity of head with the particular coin used. In general we can regard it as an 
axiom based on experience that the relative frequencies of any particular 
observed results in a long series of random experiments, performed under 
uniform conditions, will show this same kind of long-run stability and may 
therefore be replaced approximately by probabilities. 

In practice, the long series of random experiments is often hypothetical. 
One or two experiments only are actually made, but these are regarded as 
samples from a very large, possibly even infinite, sot of experiments that 
might conceivably be carried out, given unlimited time, opportunity, and 
money. The probabilities which are deduced relate to this hypothetical 
population. 

942 Mathematical Models. When a set of statistical data exhibits some 
definite regularities we may be able to form a mathematical model of the 
process from which further consequences may be deduced. If, for example, 
a set of tosses of a coin shows a proportion of heads approximately one-half, 
we can replace the actual sequence of tosses, for mathematical purposes, by a 
random variable x which can take on the two values x — 0 and x = 1, each 
with probability This enables us to deduce the variance and other moments 
of the distribution. 

In Chapter VIII we saw that the observed distribution of weights of 
1000 Glasgow schoolgirls could be well fitted by a normal curve. In the 
process of fitting we replaced the irregular actual distribution by a symmetri- 
cal mathematical model, according to which the probability that a schoolgirl 
belonging to the population sampled would have a weight between x and 
X -f da? pounds is given by 

p(i) dx = 

M and «r being constants. This model is convenient for making further 
inferences about the population. 

We shall have many examples in later chapters of the process of setting up a 
mathematical model. A variable x may, for instance, be suspected of increas- 



140 


Probability 


IX 


ing on the average steadily with the time t (at least, over a c^i^in int^al 
of time) and of being also subject to random fluctuations. This dependence 
on t may be expressed by the mathematical model 

X - a + bt + €, € = (0, <r^) 

which means that, apart from the random component c, x is a linear function 
of t, and that the random component itsell* is normally distributed with mean 
asero and variance (See end of §8.3.) The assumption of normality is not 
necessary, but it is a great convenience mathematically. There is a danger 
in using this, or any other, mathematical model, that the assumptions made 
may not actually be satisfied by the data supposed to be represented by the 
model. If there is grave doubt about this, the model may have to be changed. 


Exercises 

1. If A and B are independent events, with P{A) *= and P{B1 f, what in 
PlA-fB)? Ans. f Hint. P{AB\ ^ P{A]P{B]. 

2. UP{A] -i,PlBi -f,andPU+B) « what is P1B|A 1? Ans, ^ 

8. Prove that if P\B\A\ - P|B), then also P{B\Al -=P{BJ, PlAiB) ^PUl and 

P|4|5) -PMl. 

4* li A, Bf C are three events, prove that 

PIA+B4-C) «P(A) -hPlB) +P{C)-P{AB1 - P{A Cl - P{B C) -f PIABC). 

Hint. Denote the event B + C by D. Use (9.9) for P{A -f D1 and then for P{D1. 
Note that P{A D] « P\AB -h AC] « P{AB] -f P{AC] - P{ABC1. 

5. (a) How many 5-digit numbers are there with every digit odd? (b) How many are 
there with no digit lower than 6? Ans. (a) 3125; (6) 1024. 

6. How many numbers greater than a milhon can be formed from the digits 2, 3, 0, 3, 4, 
2,3? Ans. 360. 

7. How many arrangements can be made of the letters of the word “draught,^’ if the 
vowels are never separated? Ans. 1440. 

8 . (a) How many ‘‘words” (i.e., arrangements of letters) can be formed using all the 
letters of the word “article”? (6) In how many of these do vowels occupy the even places? 
Ans. (a) 5040; (h) 144. 

9. If 4P(n, 3) « 6P(n — 1,3), what is n? 

10. At a dinner table the host and hostess sit opposite each other. In how many ways 
can 2n guests be arranged so that two particular guests do not sit together? 

Ans. 2(2n* ~ 3w -f 2)(2n - 2)! 

11. Four strangers board a bus, in which there are 6 empty seats. In how many different 
ways can they be seated? 

12. Six examination papers are set in a comprehensive examination, two of them in 
mathematics. In bow many different orders can the papers be given if the two mathematics 
papers are not to be successive? Ans. 480. 

18. An 8-oared boat is to be manned by a crew chosen from 11 men. In how many ways 
can the crew be chosen and arranged if 3 men can steer but cannot row, and the rest can row 
but cannot steer, and if 2 men con row only on the port side? Ans. 25,920. 

14. Show that the number of ways in which p positive and n negative signs may be placed 
in a row so that no two negative signs shall be together is C(p + 1, n), for all nonnegative 
integral values ol p and n. 
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HinL Uae Tliecnem 7. Ilie n negative signs can be thought of as partitions between 
n + 1 oompartmentfi in which 4- signs are distributed, n — 1 of these must be placed one 
in each compartment, except the end ones. The rest can be distnbuted at pleasure. 

IS. Prove that rC(n, r) « nC(ri — 1, r — 1). 
n 

16e Prove that ^rC(n, r) ~ 

r-l 

n n— 1 

ffirU, Use exercise 15. X^C(n — 1, r — 1) « 53C(n — 1, r), which is the total number 
r**l , r«*0 

of ways of selecting 0 or 1 or 2 • • • orn -* 1 out of n — 1 different things. 

17* Six cards are drawn at random from a deck of 52 cards. What is the probability 
that 3 will be red and 3 black? A ns. 0.332. 

18* If 4 cards are drawn at random from a deck of 52 cards, what is the probability that 
there will be one card of each suit? Ans. 0.1055 

19» If 80 similar balls are placed at random in 8 bags, empty bags being admissible, what 
is the probability that no bag contains less than 3 balls? Ans. 13/77,996. 

80. A room has 3 lamp sockets. From a collection of 10 light bulbs, of which 6 are no 

good, a person selects 3 at random and puts them in the sockets. W^hat is the probability 

that he will have light? Ans. 

Hint Find the probabihty of not getting light, i.e., of selecting 3 bad bulbs. 

81. If C(n, 12) == C(n, 8), what is n? What is 6^(22, n)? 

88 . If a man has 6 friends, in how many ways can he invite one or more of them to 

dinner? Ans. 63. 

88 . An urn contains 12 balls of which 3 are marked. If 5 balls are drawn out together, 
what is the probability that all three of the marked balls are among these 5? Ans. 

24. If p 18 the probability of occurrence of an event in a single tnal, show that the proba- 
bility of at least one occurrence in n independent trials is 1 — (1 — p)". 

26. What IS the chance of tlirowing 6 with a die at least once m 5 trials? Ans. 0.598. 

26. What is the chance that a hand at bndge contains the Ace and King of Spades? 
Ans. -j^. 

87. A batch of 1000 lamps is known to have 5% defectives. If 5 lamps chosen at random 
are tested, what is the probability that none of them will be defective? What is the proba- 
bility that exactly 2 defectives will be found? 

Ans. (a) C(950, 5)/C(1000, 5); (h) C(50, 2)C(950, 3)/C(1000, 5). 

88. A manufacturer supplies cheap clocks in lots of 50 A buyer, before taking a lot, 
teste a random sample of 5 clocks, and if all are good he accepts the lot. Otherwise he refugee 
it. What is the probability that he will accept a lot containing 10 defective clocks? What 
is the probability that he will reject a lot containing only one defective clock? 

Ans. (a) 0.31; (5) 0.1. 

89. A factory produces a certain type of screw, put up in boxes of 100. Boxes are in- 
spected by taking 20 screws at random out of the box and rejecting the box if any defectives 
are found. What is the probability of passing a box contaimng 2 defective screws? 

Ans. 0.638. 

80. An enemy factory covers 2.5 acres, and the power plant in this factory occupies 
200 square yards. If a bomb is dropped on the factory from a high altitude, it may be sup- 
posed equally likely to strike anywhere. What is the probability that, if a bomb does hit the 
factory^ it hits the power plant? How many bombs must be dropped to give a probability 
of over 0.9 that at least one will hit the power plant (1 acre = 4840 sq yd)? 

Ans. (a) 2/121; (b) 139 at least. 

81. Two points are marked at random on a straight line of length a. What is the probi^ 
bility that the distance between them will exceed c, where c < a? Ans. (a — c)*/o*. 

Hint If the first point is not within a distance c of either end of the line, the second 
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point can be anywhere except in the interval 2c centered on the first point. The probability 
of the combined event is (o — 2c)*/a*. If the first point is within distance x of either end 
(x < c), the excluded interval is c -f- x. The probability of the joint event for any a; < c is 



82. Calculate the me^n and variance and sketch the graph of 
disiribtition: 


f(x) — l<a:<l< 

f(x) = 0 for X < — 1 and for x > 1 


the following rectangular 


Sketch the distribution function F{x). 

88 . Calculate the mean and variance and sketch the graph of the following triangular 
diatribution: 

f(x) <= 2(x + V2/2), -V2/2 < X < 0 

/(x) = 2(V2/2 - x), 0 < X < V2/2 
/(x) =0, X < — V2/2 or X > V2/2 

Ans. 0, 

Hint. Integrate separately for the tw^o regions — ^2/2 to 0 and 0 to V2/2. 

84 . The probability density of a variate x is 
' f(x) « 0, X < 1 

-“3x® 3x* 9x 

— + T-T’ 

f(x) =0, X > 3 


(a) Verify that the area under the curve is unity. (6) Sketch the graph, (c) Find the 
mean and standard deviation of x. (d) Find the probability that 1 < x < 1 

Ans. (c) 2.1, 0.436; (d) 63/612. 

85 . A continuous distribution of a vanate x is defined by 

/W =|, 0<X<1 

' m = h 1 < X < 2 

, /(x) = i(3 - x), 2 < X < 3 


Sketch the distribution and find the variance of x. Ans. -j^. 


86. Two defective radio tubes were accidentally placed in a box w ith 5 nondefective tubes. 
The tubes are tested one at a time until the second defective is found. Compute the proba- 
bilities /(x) that the xth tube tested is the second defective. Find the mean and variance 
of X. Ana. 6|^, 2f . 

Hint X «» 2, 3, 4, 6, 6, or 7 . Find the probability that exactly one tube is defective in 
the first X — 1 tested and that then the next tube tested is also defective. Evaluate for the 
different values of x. 


87. A bag contains 6 nickels and a quarter, all being wrapped in paper, so as to be indis- 
tinguishable. A boy is allowed to draw one coin at a time and keep it until he draws the 
quarter, when he must stop. What is his expectation? Ana. 37.6 cents. 

88. A throws 6 pennies on the table, and pays B 6 dollars if either 6 heads or 6 tails appear, 
and 6 dollars if 6 heads or 6 tails appear. In .every other case he takes B’s stake. How 
much should this stake be to make the game fair? Ana. $1.44. 

89. There are three identical-appearing envelopes in a drawer. One contains two II 
bills and one $10 bill, the second contains one $1 bill and two $10 bills, the third contains 
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three $l bills. If a man is allowed to pick one envelope and draw one bill from the envelope 
without looking at it, what is his expectation? Am. $4. 

Hint The conditions are equivalent to picking one bill at random from 3 tens and 6 ones. 

40 . A circle of diameter 8 inches is drawm in the interior of a square of side 12 in. A 
penny (diameter -f in.) is drop]>ed on the square, which is lying on a horizontal table. If 
only those cases are counted when the penny lies completely inside the square, what is the 
probability that at least part of the coin lies outside the circle? Am. 0.674. 

41 . The floor of a large room is made of hardwood, laid in strips one inch wide, with 
cracks between of negligible width. A com of diameter 1-J in is dropped on the floor. Find 
the probability that the coin touches three strips. 
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CHAPTER X 

THE BINOMIAL AND POISSON DISTRIBUTIONS 

10.1 A Coin-tossing Problem. Suppose we toss a coin s times and ask 
what is the probability of getting exactly x heads. We can symbolize the 
results of a set of tosses by a row of O's and Ts, in which a 0 means ‘^head'^ and 
a 1 means “tai^^ Thus, 

00101111010011000101 ••• 

We will suppose that, for this coin, head and tail are equally likely to fall 
uppermost, so that the probability of head on a single toss is J. We do not 
care what the arrangement of O’s and I’s may be, as long as there are just x 0^8 
and « — X Ts. The number of favorable arrangements is therefore the num- 
ber of permutations of s things, x alike of one kind and s — x alike of another 
kind, and by Theorem 3 (Corollary) of Chapter IX this is s !/[x ! (s — x ) !]. The 
total number of ways in which the tosses may turn out is 2% since in each of 
the 8 independent tosses there are 2 possibilities. The probability required 
is therefore 

(10.1) m = (i)* = c{s, X) (i)* 

This is a discrete distribution, since x is obviously an integer between 0 and 
6 inclusive. Thus, for s = 4, x can take the values 0, 1, 2, 3, or 4, with prob- 
abilities given by 


X 

0 12 3 4 

fix) 

1 4 6 4 1 


The probability of exactly 3 heads in 4 tosses is therefore A ~ i- The prob- 
ability of at least 2 heads is (6 + 4 + i)/16 = and so on. 

10.2 Binomial Coefficients. The quantities C(s, x) which appear in 

(10.1) are called binomial coefficients because they can be obtained by ex- 
panding the binomial expression g + p raised to the sth power. Thus 

f (? + p)* = g* + 29P + p* 

(10.2) I (g + p)* = g* + 3g’p + Zqp^ + p* 

i (g + P)* = g* + 4g*p + Og’p* 4- 4gp* + p* 

etc., 

The coefficients in the first line of (10.2) are C(2, 0), C(2, 1), and C(2, 2); 
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those in the second lino are C(3, 0), C(3, 1), C(3, 2), and C(3, 3); and so on. 
In general, for any positive integer s, 

(10.3) (g + py - g* + C(8, l)g->p + C(s, 2)g*-V + • • • + p* 

where we have written C(s, 0) = C(s,s) = 1. The general term in this 
expression is C(8, a;)g*~*p*, and if we put p = g = so that q'~‘‘p* = (i)*, 
we get precisely the f(x) of (10.1). The probability of getting exactly x 
heads in « tosses is therefore the (x + l)th term in the expansion of the 
binomial (J + i)*. 

We now prove that 

(10.4) C{s,x — 1) + C(s, a;) = C(s + 1, x) 


Proof: 

C(s, a: — 1) + C{8, x) 


s\ 

(x — l)!(s — X + 1)! 
s! 

(x — l)!(s — X + 1)! 
s! 

(x — l)!(s — X + 1)! 


x!(s — x)! 

(8+ 1) ^ (8+l)\ 

X x!(,s + 1 — x)! 


which is C(« + 1, x). This result is the basis for a method of writing down 
the coefficients for different values of s. The scheme is generally called 
PaacaVs Triangle j after Blaise Pascal (1623-1662), although it seems to 
have been known much earlier. The coefficients for successive values of 
s(0, 1, 2, 3* • • ) are written on successive lines in the form of a triangle (Table 
34), and in any line each entry is the sum of the two entries in the line above 


0 

1 

2 

3 

4 
6 
6 

7 

8 
9 

10 


Table 34. Pascal's Triangle 


1 

1 1 
12 1 
13 3 1 

1 4 6 4 1 

1 6 10 10 6 1 
1 6 16 20 15 6 1 

1 7 21 35 35 21 7 1 

1 8 28 66 70 66 28 8 1 

1 9 36 84 126 126 84 36 9 1 

1 10 46 120 210 252 210 120 46 10 1 


to its immediate left and right. Thus in the 5th line (3 = 4), we have 
4 « 1 + 3, 6 = 3 + 3, etc. 
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10.3 The Binomial Distribution. Instead of the tossing of a coin lot us 
consider an event A of which the probability in a single trial is p. We suppose 
that we can make a succossion of independent trials, and that in each one the 
event A either liappens or docs not happen. If, for example, A is the throw- 
ing of a six with a good die, A is the event of not throwing a six (that is, of 
throwing any other number), and since either ^4 or ^ must happen, 
P[A] + P[A\ = 1. We shall denote the probability of A by g, and therefore 

(10.5) p + g - 1 


We require the probability of exactly x successes in s trials, a success being 
the event ^^A happens.'^ Since the trials are independent, the product law 
of probability bolds, and the probability of any given succession of A *9 and 
A% such as 

aaAAAAaAaaAaA !••• 

is PPQQQQPQPPQPQQ^** 

If there are x A^s and (s — x) A^9 in this sequence, the probability is p*g*~». 
But if we are interested only in the total number of A^s and and do not 
care where the}^ come in the sequence, we must multiply this probability by 
the number of ways of permuting Xul^s and (s ~ x) A^s^ all ways being sup- 
posed equally likely. This number is C{s,x), and therefore the required 
probability is 

s! 

(10.6) fix) = C(s, a:)p*g*’'* = — - P*^*”"* 


which, as we have seen, is precisely the general term in the expansion of 
(q + p)*. The distribution given by (10.6), for values of x = 0, 1, 2,* • s, 
is called the binomial distribution^ or the Bernoulli distributioUj after James 
Bernoulli (1654-1705) who stated the foregoing result in his posthumously 
published Ars Conjectandi in 1713. 

In order to calculate successive values of /(x), it is usually convenient to 
use a recursion formula, that is, a formula which gives /(x +1) when fix) is 
known. Thus by starting with /(O) we can form successively /(I), /(2), 
and so on. The formula is 


(10.7) 

Proof: By (10.6), 

fix + 1) - 


/(x + l) 

X + 1 q 


(x + l)!(s — X — 1)1 


p*+l qr-r-l 


Therefore, dividing /(x + 1) by/(x), we have 

fix + 1) x!(a - x)! pH-igt-c-i 

fix) (x + 1)1(« — a: — 1)1 


s — xp 

X + 1 g 
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Now 

/(O) = pY = g., /(I) = /(O) =. 11 f(0) 

U •f' 1 g 1 g 

/(2) =^^?/(l) 

and so on. 

If, for example, 5 = 5 and p = J, we can calculate the respective values 
of f(x) for X = 0, 1, 2, 3, 4, 5, as follows: 

/(O) = = 0.4019 

/(I) = 5 • \f{0) = 0.4019 

o 

/(2) =^-;g/(l) = 0.4/(l) =0.1608 
/(3) = ^ • g/(2) = 0.2 /(2) = 0.0322 

m = • -!/(3) = 0.1 /(3) = 0.0032 

/(5) = • ■J/(4) = 0.04 /(4) * 0.0001 

o 5 


Apart from a slight error of rounding-off, these add up to 1, as they should. 


The binomial distribution may be 
represented by a histogram. If we 
construct rectangles of unit base, 
centered at x = 0, 1, 2, • • •, s, with 
heights equal to /(x), the total area 
of the histogram will be unity. The 
histogram for s == 5 and p = i is 
shown in Fig. 38. 

Notice that in this distribution all 
the values of x are actually concen- 
trated at the centers of the intervals, 
so that there is no grouping error. 
The distribution function is stepped, 
like that in Fig. 8 of §2.6. 

The modal value of x in the bino- 
mial distribution naay be found as 
follows: 



I^Q. 38 
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If i is the mode, /{£) must be at least as groat as the frequencies for the 
adjacent values f — 1 and £ + 1, that is, 


m 

f(£ - 1) 


> 1 and 


m 

f(£ + 1) 


> 1 


Substituting the appropriate values for /(f),/(f + 1) and/(f — 1 ) from (10.6), 
we find 

f{£) s! (i - l)!(s - f + 1) ! _ s- £ + l p 

f(£ — 1) i!(s — f)! s! £ q 

and similarly 

f{£) _ i + 1 g 
f{£ + 1 ) s — £ p 


Hence we see that £ satisfies simultaneously the relations; 


f q s — £ p 

These can be written 

(s + l)p > (P + sp — Q < (P + q)^' 
or, since p + g »= 1, 

( 10 . 8 ) f<sp + p, £ > sp + p — 1 

Since £ is an integer, £ is uniquely determined by the equations (10.8), unless 
(« + l)p happens also to be an integer. If so, the two values sp + p and 
ap + p — 1 correspond to equal values of /(f) and the inode is not unique. 
This is the situation in Fig. 38, where {s + l)p == 1, and /(O) = /(I). 

Example 1. What is the most probable number of times that an ace will appear if a die 
is tossed (a) 50 times (6) 53 times? 

Assuming that the probability of ace is we have (a) « •= 50, p »= f < 8.5, and f > 
7.5, so tliat the most probable value is f = 8; (5) « = 53, p = f < 9, and f > 8, so 
that now the numbers 8 and 9 are equally probable. 

10.4 Moments of the Binomial Distribution. The binomial distribution 
for assumed values of p and s is a theoretical discrete distribution with p and 
» as parameters, so that according to the convention of distinguishing param- 
eters by Greek letters, we should write, say, B and v instead of p and s. 
(The symbol tt is sometimes used, as the Greek form of p, but there is a risk 
of confusion with the customary meaning of tt.) We shall deal in the next 
chapter with the problem of samples from a population in which the variate 
is assumed to be binomial, with a probability of success which is estimated 
from the relative frequency of success in the sample, and it will then be neces- 
sary to distinguish between the relative frequency p, and the probability 
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of which p is an estimate. In the present section, therefore, we will rewrite 
the binomial distribution law as 


(10.9) 


fix) - cis, x)e-ii - ey-- 


which brings out the fact that is a parameter. It is true that s is also a 
parameter, but in most practical applications s is an integer determined by 
the nature of the problem and is not estimated from the sample. 

By the usual formulas for a discrete distribution, the mean (or expectation) 
of x is given by 


( 10 . 10 ) 

and the variance by 

( 10 . 11 ) 


= Jlxjix,) 


• -0 


= Sa:.2/(a:,) ~ 


• -0 


For the binonxial distribution, 


so that 

( 10 . 12 ) 


Xq = 0, = 1, X2 = 2,* • -yX, = 8 


-Ilk 


s’ 


*“0 kl(s - k)\ 


S'. 


(k - l)!(s - ^:)! 


- 0)*-* 


fl*(l - 0)*- 


since the term with A = 0 vanishes. Also, 

(10.13) = Ik" e-^ii - ey-’’ - 

k\(8 — k)\ 

Now, by the binomial theorem, 

(10.14) {(? + (1 - 0)j* = ICis, k)d>-i\ - 0)'-* 

t-o 

« 1 

since <> + (1 - e) = 1 

From (10.12), taking s as a factor out of s! and 0 out of tf*, we have 
V - e)'-* 

ik-\)\is-k)\ 

.A (s - l)!fl*-'(l - fl)’"* 

{k-ms-k)\ 

- sBiCis -l,k- l)fl*-*(l - »)*-»-<*-!> 

jk-1-0 

» 
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by (10.14), with ifc — 1 instead of k and « — 1 instead of a. We obtain, 
therefore, the important relation 

(10.15) M 

To calculate from (10.13) it is convenient to write F = k{k — 1) + A:, 
since k and A: — 1 are factors of kl. Then 


= z 

/e«0 


\k(Jc - 1) + k} 
klis - k)\ 




ey-k ^ ^2 


The second term in tiio bracket, k, gives precisely m- The first term gives, on 
canceling k{k — 1) into /r!, 


kti (k - 2)!(s - *)! \-4lo {k - 2)!(s ~ k)\ 

= 8(s - 1)^2 

by (10.14), with s — 2 and A: — 2 instead of 8 and k. 

Therefore = 5(s — 1)^^ + p — 


= 8(8 - 1 )$’^ + 8 $- 
^ 8 {0 - B'^) 

(10.16) = 86(1 - 6) 

The standard deviation of the binomial distribution is therefore [s^(l — ^)]^/*. 

Higher moments can be calculated by the same method, but can be more 
simply obtained by a recursion formula. (See Part Two, §2.8.) Here we 
simply mention that the third moment is 

(10.17) fi, = 56^(1 ~ ^)(1 ~ 2$) 
whence the moment measure of skewness is 

(10.18) as = (1 - 2^)/[5^(1 - e)Y/^ 

= (1 — 2B)/a 

The binomial distribution is therefore always skew unless ^ « i. 

10.6 Fitting a Binomial to a Given Distribution. A simple experiment in 
sampling may be conducted by a class as follows: A mixture of 100 red and 
200 white balls (identical except for color) is placed in a box. These may be 
wooden balls, about 1 cm in diameter. Samples of 10 balls are extracted by 
stirring up the contents and inserting a small wooden paddle, in the upper 
side of which are 10 holes, into which the balls will fit easily. By dipping 
the paddle into the mixture and bringing it up, a sample of 10 is easily exam- 
ined. The number of red balls in the sample is noted, and the balls are then 
returned to the box, and the procedure is repeated. It does not take long for 
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a class of students to compile three or four hundred samples in this way, and 
the results can then bo set out as a frequency distribution. A distribution 
so obtained is shown in Table 35, columns 1 and 2, where x is the number of 
red balls in the sample and /o the observed frequency of x. 


Table 35. Binomial Distribittion Fitted to Sampling Results 


X 

h 

W = i) 

Sc{e = 0.36) 

x/o 

I’/o 

0 

2 

6.1 

4 0 

0 

0 

1 

22 

30.3 

22.7 

22 

22 

2 

63 

68.3 

57 5 

126 

252 

3 

76 

91.0 

86 2 

228 

684 

4 

96 

79.7 

84.8 

384 

1536 

5 

66 

47.8 

57.3 

280 

1400 

6 

26 

19.9 

26 8 

156 

930 

7 

8 

6.7 

8.6 

56 

392 

8 

1 

1.1 

1.8 

8 

64 

9 

0 

0.1 

0.2 

0 

0 

10 

0 

0.0 

0.0 

0 

0 


360 

350 0 

349.9 

1260 

5286 


This method of sampling does not correspond exactly to a binomial situa- 
tion, since the 10 balls are taken out at once. Strictly, the balls should be 
picked one at a time and each one returned to the box after its color has been 
noted, but this takes too long and the method described above is a good 
approximation. 

We can now compare the distribution obtained with a theoretical binomial 
distribution. Since exactly one-third of the balls in the box are red, and since 
every ball has an equal chance (approximately) of being included in the 
sample, we can take the theoretical probability 0 as J. The probability of 
X red balls in a sample of 10 is given by 

(10.19) /(a;) = C(lO,x)0y(|)'" ' 

BO that the calculated frequency fc in 350 samples is 

(10.20) /, = 350 /(x) 

The values of fc are given in column 3 of Table 35. Thus, 

/.(O) = 350 0“ = 6.07 

Ml) = Y • 

M2) - I • ImD - 68-28 

etc. 
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The observed and calculated frequencies correspond roughly, but the agree- 
ment does not look to be remarkably good. A method of judgmg how good 
it is (the “chi-square” test) will be given in Chapter 13. Meanwhile, wo can 
calculate the mean and variance of the sample distribution and compare 
them with the theoretical values. 

The distribution mean J is found by calculating “ 1260 (see column 5 
1260 

of Table 35). Then 2 = = 3.60, and this may be taken as the estimate 

350 

of M for the population consisting oi aii possible samples of 10 from the collec- 
tion of red and white balls. Actually 350 samples were obtained, but these 
form only a very small fraction of all the samples that could be taken with 
unlimited time and patience. The distribution variance is given by 


s. 


2 



15.103 - 12.96 == 2.143 


and an unbiased estimate of cr is = 350 = 2.149. 

The theoretical values of and are = 3.33 and 50(1 — 0) = 2.22, 

respectively. 

Without assuming the value i for 0, we can fit a binomial to our frequency 
distribution in the same way ao we fitted a nonuai curve to a given distribu- 
tion, namely, by estimating the parameter 0 from t\e distribution itself. That 
is, we take for 0 the value x/s = 0.36. The calculated frequencies are now 
given by 

(10.21) fc = 350 (7(10, x)(0.36)"(0.64)'o-. 

Then /c(0) = 350 (OM)^^ - 4.035 

/.(l) = 22.70 

etc. 

The values are given in column 4 of Table 35. They agree much better 
than before with the observed values. The mean is, of course, now identical 
with the observed mean because it was made to be so. The new theoretical 
variance is 10 X 0.36 X 0.64 = 2.30, so that the agreement in this respect 
is rather worse than before. 

10.6 The Poisson Distribution. Situations sometimes occur where the 
probability 0 of a certain event is very smalb but where nevertheless the num- 
ber of trials s is so large that the expected number of occurrences of the event 
is of moderate size, say between 0.1 and 10. Examples are the number of 
persons born blind per year in a large city, the number of typographical errors 
made by a good typist in a large number of typed pages, the number of bridge 
hands containing 4 aces in an evening of play at a bridge club. The events 
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here are “rare events/' that is, individually. The probability of a blind birth 
is very small, but the number of persons born per year in a large city is large, 
so that such births taken over the whole city and the whole year are not rare. 
In situations like these, the true binomial expression for the probability of 
occurrence of x events can be replaced by a convenient approximation due to 
S, D. Poisson (1781-1840) and known as the Poisson distribution. 

We suppose, then, that d is small and s large, in such a way that X = is 
a number of the order of unity. We want to find an expression for the bi- 
nomial probability, 

fix) - — -^(1 -- e)*-- 

^ s(.s - l)(s -2) ‘ - (s-x+1) 0- _ X 



The number of factors in the numeratoi of the first fraction is r, and if we 
divide each of them by .s, we account for the term s® in the denominator of 
fix). Therefore, 


(10.22) fix) - 






Now, for a fixed value of j, w'o let s become very large, tending to infinity. 


1 2 

Then all the fractions - > - j 
s .s 


1 X 


terms like 1 


1 


1 


>- become very small, and all the 

a S 

1 ~ ^ are practically equal to 1. When 

« .s 

O-S) is raised to a fixed power x, it is still practically 1, but in the numer- 
ator of (10.22) it is raised to the power s, which is very large and ultimately 
infinite. It is shown in most textbooks on elementary calculus that the limit 




as 5 — ► CD is where e is the number (2.71828* * •) which we 


have already encountered in the normal law and which is the base of natural 
logarithms. Hence, when s — ► <» , 


( 10 . 23 ) 


fix)- 


— e-^ = P(a:, X) 
x! 


which is Poisson's function. For s large but finite, the expression on the 
right-hand side of (10.23) can l)e taken as a good approximation to fix). 
The advantages are that it is comparatively easy to calculate and that good 
tables of P(x, X) exist. (See Reference 1.) 
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The Poiason distribution is, like the binomial, a discrete distribution, smoe 
X must be a normegative integer. There is no upper limit on x, but the prob- 
abilities of large values of x are extremely small. 

As in §10.3 it is easily proved that the mode of the Poisson distribution is 
the integer lying between X — 1 and X, unless X happcms to l>e an integer, in 
which case the values for X and X — 1 are equal. 

10.7 Moments of the Poisson Distribution. If we add the values of P(x, X) 
fo? all values of x, wo obtain 


Now, it can be proved tliat the series in parentheses has a sum for any value 
of X, and this sum is Therefore 

(10.24) ip(x, X) = 1 

*-0 


which is, of course, wliat we should expect if P(x, X) is actually the probability 
of the value x. 

The expectation of x is given by 


(10.25) 


M = ExP(x, \) 

3C-0 

(x- 1)! 




X* 

+ -,+• 


■] 


The expected value of x is therefore X or s6, as with the binomial law. 
The variance of x is 


|a;(x 1) "I" a;}X*e 

x_i (x - 1)! 

= \E - x» = x(x + 1) - X* X 

*-0 xl 


by (10.24) and (10.25). 
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The variance of x is therefore also equal to X. This fact, that the variance is 
equal to the mean, is a characteristic of the Poisson distribution. 

Higher moments can be found in the same way. It turns out that the 
third moment in again equal to X, so that the skewness ^3 is X'^/^ The 
kurtosis 72 (= ^4 — 3) is X“^ For large values of X, the skewness and kur- 
tosis are both nearly zero, and the distribution can then be closely fitted by 
a normal curve. 

10.8 Fitting a Poisson Distribution to a Given Empirical Distribution. 

A sampling experiment similar to the binomial one described in §10.5 can be 
carried out with colored balls, using, say, a mixture of 100 red balls and 1100 
white ones, and taking samples of 50 with a larger paddle Tlie probability 
of picking a red ball is about and therefore X = = 4.17. Here 

we have neither a very small 6 nor a ver}^ large s, but the experiment does 
show that the Poisson approximation is reasonably good. The results of 
one such classroom experiment are shown in Table 30. No observed value 
of X in the 300 samples taken was larger than 9, but the theoretical values 
continue for a* = 10, 11, 12 - • • , and so the values for all x greater than 01 equal 
to 9 are lumped together for convenience. 

The distribution mean is x = 1324/300 = 4.413, which is an estimate of 
X, and is somewhat higher than the theoretical value 4.17 based on the assump« 
tion that all the balls are equally likely to be picked. The distribution vari- 
ance is sj = 3.756, and the estimated is therefore 3.768, somewhat lower 
than the theoretical X. 


Table 36. Fitting op Poisson Distribution 


X 

/o 

fA\ = 4.17) 

/r(X - 4.41) 

xfo 


0 

1 

4 7 

3 6 

0 

0 

1 

16 

19 4 

16 0 

16 

16 

2 

36 

40 4 

35 4 

72 

144 

3 

48 

56.1 

52.1 

144 

432 

4 

62 

58.4 

67,6 

248 

992 

6 

51 

48.7 

50.7 

255 

1275 

6 

41 

33.8 

37.3 

246 

1476 

7 

22 

20.1 

23.5 

154 

1078 

8 

18 

10.6 

13.0 

144 

1152 

>9 

6 

7 9 

10.9 

45 

405 


300 

300 0 

300.0 

1324 

6970 


The frequencies fc are calculated by the formula 

300 X*e-^ 



for the assumed value of X. If Molina’s tables (Reference 1) are not avail- 
ablo; ^can be found from a table of e”* (or by logarithms — the common 



Table 37. Short Table of c' 
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B 

1 

T 

o 

1-4 

T 

o 

r 

o 

r 

o 

irH 

7 

O 

t-t 

r 

o 

1—4 

T 

o 

1-4 

60 

0.4066 


s 

lO 

o 

i 

.0745 

.2739 

.1008 

.371 

.136 

s 

0 8 

0.4493 

.1653 

.0608 

.2237 

.0823 

3028 

.1114 

.410 

*o 

55 

0.7 

0.4966 

\ 

oo 

.0672 

2472 

o 

S 

o 

3346 

. 1231 

CO 

UO 

r- 

CD 

CD 

9 0 

0 5488 

.2019 

0743 

.2732 

.1005 

3698 

.1360 


184 

$ 

0.5 

0.6065 

.2231 

.0821 

.3020 

.1111 

00 

CO 

1 

s 

o 

(N 

kO 

d 

0 6703 

.2466 

.0907 

.3337 

(M 

r- 

CS| 

o 

CD 


ID 

(M 

* 

g 

00 

d 

0.7408 

.2725 

.1003 

.3688 

t- 

»o 

CO 

§ 

CO 

g 

CD 

CD 

Oi 

s 

0 2 

0.8187 

.3012 

.1108 

.4076 

.1500 

§ 

' 


.747 

1 

1 

0.1 

0.9048 

.3329 

.1225 

.4505 

.1657 

fe 

s 

CO 

s 

(N 

• 

.825 

1 

1 

0*0 

OOOOI 

.3679 

.1353 

.4979 

1 

.6738 

.2479 

.912 

.335 

CO 

»-1 





CO 

B 


CO 

B 

00 

OS 


To find look under row 4 and column 0.4, and multiply the entry by the factor in the last column for row 4, namely 10~*. 
Then = 0.01228. 
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logarithm of e is 0.4342945) and the successive values of /. found by a recur- 
sion formula. The formula is 


(10.26) 

Then 


/c(x + 1) ^ £! ^ X 

fcix) “ (a: + 1)1 X* “ a: -f 1 

/.(O) = 300 e-^ 

/.(I) = X/.(0) 

M2) - ^/.(l) 

M2) = g/.(2) 


and so on. In Table 36, columns 3 and 4, these are calculated for X *= 50/12, 
and for X ~ 4.413 (the value estimated from the distribution itself). For 
use in this example and the exercises at the end of the chapter a short table 
of is given in Table 37. 

To calculate for X = 4.413, we find by linear interpolation in the table, 
between the values for 4.4 and 4.5, that e~^ — 0.0121, so that /c(0) = 3.63. 
By logarithms, logio = —4.413 X 0.43429 = —1.9165 = 2.0835, so that 
= 0.0121, checking the result from the table. 

10.9 Poisson Distribution for Random Events. The Poisson distribution 
arises in a number of problems in which events occur over an interval of time 
in a random way. For example, in a collection of atoms of a radioactive 
element there will be, on the average, a certain number N which will dismte- 
grate in a time T, The probability that exactly x atoms will disintegrate 
in time T is 

P(x, N) - N^e’^/xl 

This naay be proved on the assumption that the disintegrations are both 
individually and collectively random. This means that the splitting of one 
atom has no effect on the chances of disintegration for any other atom, and 
also that the rate of decay of the radioactive substance is sufficiently slow 
that the probability that an atom will disintegrate in a small time 8t is inde- 
pendent of what may have happened during any preceding time. 

The same problem arises at a telephone switchboard, where the random 
events are incoming calls. It may be assumed that the calls are independent 
of each other. The assumption of collective randomness is not so clear, 
because there are busy periods and slack periods (such as lunch intervals), 
but if the time T minutes is not too long we can suppose that the number of 
calls in one minute during this tim.e is independent of what has happened in 
earlier minutes of the same period. On these assumptions the probability 
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that in a time t minutes there will be n calls is 

P(n,kt) « (JkO’‘e“*Vnl 

where k N/T, the average number of calls per minute during the whole 
period. 

The same theory applies to the rate at which ‘^clicks” are heard in a Geiger 
counter, when placed near a constant source of radioactivity. The data in 
Table 38 were obtained for the number of clicks x registered in 10-second 
periods in a physics laboratory, only the general “background'^ radiation 
being used. The mean number per lO-second period was 7.952, and the 
calculated values of the expected number of times that 0, 1, 2* • • clicks would 
be registered in 10 seconds are given in column 3. The general agreement of 
the theoretical Poisson distribution with the observed distribution is clear, 
and, as we shall see later, this agreement is quite as good as we could expect 
even if the assumption of a Poisson distribution for the population of lO-second 
intervals is really true. 

Table 38. Geiger Counter Readings (Background) in 10-Second Intervals 


X 

fo 

/. 


fo 

fc 

0 

0 

0.2 

11 

35 

35.4 

1 

1 

1.4 

12 

26 

23 5 

2 

5 

5.6 

13 

16 

14 4 

3 

13 

14.8 

14 

5 

8.2 

4 

23 

29.3 

15 

1 

4.3 

5 

54 

46.6 

16 

1 


6 

66 

61.8 

17 

1 


7 

72 

70.2 

18 

2 

3.9 

8 

64 

69 8 

19 

0 


9 

67 

61.6 

20 

1 ^ 


10 

47 

49.0 1 


500 

500.0 


1. In the identity 
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6 


- i;c'(6,x) 

3C-0 


(i 


find the sum of the terms for which x has the values 1, 2, 3. Ana. 12,096/15,625. 

S« If ten coins are tossed, what is the probability that there are (a) exactly 3 heads 
(5) not more than 3 heads? Assume that the probability of head with each coin is 

Ana. (a) 120/1024; (6) 176/1024. 


S. Assume that the probability that a bomb dropped from an airplane will strike a certain 
target is If six bombs are dropped find the probability that (a) exactly two will strike 
the tatfet (6) at least two will strike the target. Ana. (a) 768/3125; (6) 1077/3126. 
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4< Und tto term iodqjendmt of z in the binomial expansion of (z — 

Ans. (— l)"C(3n, n), 

EifU, Write down the (p -f l)th term (the term in and choose p so that the 

exponent of x in the whole term is 0. 


6 * Prove that the greatest value of C[2n^ x) is w hen a: «= n. 

6, Show that the number of permutations which can be formed from 2n letters, all of 
these letters being a’s or b's, is greatest when the number of a’s is equal to the number 
of b’s. 

Hint. If there are x a’s the number of permutations is C(2n, x). Use exercise 5. 

7. An anti-aircraft battery had 3 out of 5 successes in shooting down ^‘flying bombs” that 

came within range. What is the chance that if 8 bombs came within range not more than 2 
got through? 0.316, 


8. (a) Find the values of C(15, x) for x *= 0 to a; 15, by writing out 6 additional rows 
in Fig. 34 (Pascal’s Triangle). 

(5) Evaluate the terms corresponding to x « 5 and x « 6 in C(1 5, x) ^ ♦ 

(c) Show that the expression in {h) is the probability of throwiiiE r (ives or sixes with 
15 dice. 


8. If the probability of x is as given in Exercise S, wlmt is the ( Tjiected value of x and its 
standard deviation? What is the skewness of tlie distiibution*'’ 

10. Assume that 0.04 is the theoretical rate of mortality in a (.eitain age group. Suppose 
an insurance company is carrying s = 1000 such cases. Wliat is the standard deviation 
of the death rate (i.e., of x/«, w^here x is the number of deaths)? Wliat would it be if 
5 « 10 , 000 ? 

Hint. If the expectation of x is sB^ the expectation of x/s is 6. If the variance of x is 
3^(1 - B), the variance of x/s is [,s^(I - B}\/6^ B(l ~ Bi/s. Bee (11 1). 

11. Four thumbtacks were tossed on a table 100 times. The number x that fell point up 
in each of the 100 trials was noted. The results were: 


X 

0 

1 

2 

3 

4 


4 

30 

36 

25 

5 


(а) Estimate the probability B that a thumbtack will fall point up. 

(б) Fit a binomial distribution, wutb the estimated B, to Iht* observed distribution, by 
calculating the expected frequencies in 100 trials, /c, coi responding to the observed /o. 

12. A factory turns out articles of a standardized type at tlie late of 1(»00 per day. Ex- 
perience shows that on the average 0.2% of each day's pioduction is defective. Show that 
it is rather unusual for any day’s production to include more than four defective articles. 

18 . Assume that the chance of an individual coal miner being killed in a mine accident 
during a year is Poisson law to calculate the probabihty that in a mine 

employing 350 miners there will be at least one fatal accident in a year. Ans. 0.22. 

14 . A retailer with limited storage space finds that, on the average, he is able to sell 

2 boxes of parrot food per week. He replenishes his stock every Monday morning so as to 
start each week with four boxes on hand. 

(а) What is the probability that he sells his entire stock in a w^eek? 

(б) What is the probability that he is unable to fill at least one order? 

(c) With how many boxes should he start the week to ensure that the probability of 
being able to fill all orders shall be at least 0.99? Arw?. (a) 0.143; (b) 0.053; (c) 6. 

16. The probability that a man aged 35 will die before reaching the age of 40 may be 
taken as 0.018. Out of a group of 40 men, now aged 35, w I at is the approximate probability 
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that X will die within the next 6 years? Draw up a table of the probabilities for different 
values of x, 

16 . (WallUy quoted by Wilks). The following table shows the distribution of the num- 
bers of vacancies occurring per year in the U. S. Supreme Court during the years 1837 
to 1932. 


Vacancies per Year Frequency 


0 59 

1 27 

2 9 

3 1 


Fit a Poisson distribution to this observed distribution. 

17. (Borihiewicz). A classical example of the Poisson distribution of rare events is that 
of the deaths of Prussian cavf^ry soldiers from the kicks of horses during the tw^enty years 
1875-1894. The frequency distribution of the number of such deaths in 10 army corps, 
per corps per annum, was 

Deaths Freffuency 

0 109 

1 65 

2 22 

3 3 

4 1 


Show that the mean number of deaths per corps per annum was 0.61. Fit a Poisson dis- 
tribution and calculate the theoretical frequencies. (The agreement turns out to be very 
good.) 
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CHAPTER XI . 

SIGNIFICANCE TESTS FOR BINOMUL POPULATIONS 

11.1 Approximation of the Binomial Distribution by a Normal Distribution. 

For many problems connected with binomial distributions the exact solutions 
are cumbersome and tedious to calculate. Good approximations, Ijowever, 
may often be obtained by making use of the fact that, for large values of s 
and for values of 6 not too near 0 or 1, the binomial distribution may be re- 
placed for practical piirj)oses by a normal distribution. 

Consider for example, the following problem: A coin is tossed 1()(^ times. 
What is the probability that the number of heads will lie between 40 and 60 
inclusive? 

The probability of exactly x heads in 100 tosses, assuming that the prob- 
ability of head on a single toss is is C(100, The required prob- 

ability is therefore 

.^ojKIOO ~ x)! 

and so is equal to the sum of 21 terms. The exact calculation and addition 
of all these terms is laborious. 

If tlie probability 6 of success in a single trial is the mean of x in s trials is 
s/2 and the standard deviation is (s/4y^^. As s increases, therefore, the mean 
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Fig. 39. Binomial Distribution (0 » 1/2) 
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of tho distribution increases and the dispersion also iucmases. The distribu- 
tion moves to the right along the .t axis and grows flatter, since the total area 
wmains equal to unity. This is illustrated in Fig. 39 drawn for s «= 9, 10 
and 26. If, however, we standardize the variable by changing to 

2! = (x — «/2)/(s/4)^ = 2x6’’^^ — 

and keep the area constant by multiplying the ordinates of the distribution 
by (s/4)^''®, we avoid this change in position and flattening out, and the dis- 
tribution approaches nearer and nearer, as « increases, to the familiar shape 
of the normal curve. {Sef‘ l^ig. 49). For clearness, the figures are drawm as 
frequency polygons of histogrants. 



z 

Fig. 40 . Binomial Distribution (Standardized) 


The rigorous proof that tlie linaiting form of the binomial law is the normal 
law requires quite advanced mathematics. A proof which, while not rigorous, 
is illuminating and can be followed by the student who knows a little calculus 
may be found in Part Two, Chapter II. Here we shall simply accept the 
result that for large values of s the histogram of the binomial distribution is 
approximated by a normal curve, and that the approximation is much better 
when 6 is near | than when it is near 0 or 1. 

Each term of the binomial distribution is represented in the histogram by 
a rectangle of unit base and height equal to/(x). The sum of the terms from 




S6C* 11»1 


Approximation of Distributioii 


m 


X » o to X = 6 inclusive is therefore the 
from x=*a — i to x = 6 + since 
the base of the rectangle corresponding 
to a stretches from a — | to a + J and 
the base of the rectangle corresponding 
to b stretches from 6 ~ i to b + 

This area is shaded in Fig. 41 If now 
the histogram is approximated by a 
normal curve with the same mean and 
standard deviation, and of course the 
same area, the sum of the binomial 
terms from x — a to x — b may be 
approximated by the area under the 
normal curve from x = a — ^ to x — 

6 + i- The mean of the normal curve 
will be M *= and the variance will 
be = s^(l — 0). The area from 
a — ^ to b + i will be the corre- 
sponding area under the standard normal 
to 2 :j = (6 "f i - s6)/af and this is 


combined area of the histogram 



Fig. 41 


curve from Zi — (a — J 



dz = ^{Z 2 ) - ^(zi) 


For the problem near the beginning ol this section, concerning the prob- 
ability of a number of heads between 40 and 60 inclusive in 100 tosses of a 
coin, we have s6 = 50, and a = (100/4)^^'-^ = 5, so that 


= (40 - 0.5 - 50)/5 = -2.1 
2, = (60 + 0.5 - 50)/5 = 2 1 


The approximate probability is 




0.964 


Example 1. Find the sum of the terms of y- *b tor z »= 1, 2, 3, 4 and oompare with 
the normal approximation 
By the binomial theorem, 



E C(6, 


3 \ 6 -* 
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SO that the 'terms for a; » 1, 2, 3, 4 are 


6 

15 

20 

15 


2V/3V 


3\5 

5> 

3V 


2916/15625 


<5 

^2V 


- 4860/15625 


- 4320/15625 


0.9124 


- 2160/15625 


the sum being 14256/15625 

2 2 3 

For the normal approximation, ju = 6 X - = 2.4, <r* = 6X“X- = 1.44, so that 

5 5 6 


Zl 


0.5 


2.4 4 i 

= —1.58 and 22 = — 


2 4 


= 1.75. I'he area from Z\ to 22 is 0.95994 — 

1.2 ‘ 1.2 

0.05705 = 0.9029. The error is about 1^;^, which show's that the normal approximation is 
not bad even for as small a value of s as 0. 


11.2 SigAificance of an Observed Proportion. Suppose that a population 
is divided into two categories, those which have and those which do not have 
a certain characteristic. For example, the population may consist of all 
registered births in 1952 in the state of Wiscjonsin, and the characteristic may 
he that of being male. Or the population may be tosses, of a coin and the 
characteristic ^‘heads.^^ If in the whole population there is a proportion 6 of 
individuals having this characteristic, the probability is 6 that an individual 
picked at random will have the characteristic. The probability' that in a 
sample of s individuals picked at random there will be x with the character- 
istic in question, is given by the binomial function 

/(x) = c{s,x)d^ii - ey-^ 

Now the value of 6 is usually unknowi, but it can be estimated from the 
sample, and the larger the sample, the bettei will be the estimate. We will 
suppose that p is the proportion of individuals in the sample having the 
characteristic so that p = x/s. Obviously, if the sample s is so large that it 
includes the whole population, p will coincide with but for small samples 
p will vary from one sample to another. 

The expectation of x is, as we have seen, given by p = so that, since s 

is a fixed number (the sample size), the expectation of p is - = The vari- 

s 

ance of p is 

t ("Jm - ^ 

\f>/ 

= ~ sd{i - e) = eii - e)/8 

8^ 


( 11 . 1 ) 
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einoe, by (10.16), Var (x) = s0(l - e). The variance of p naturally diminishes 

as the sample size increases. 

If now we assume^ on grounds of general knowledge or experience or in 
virtue of some h 3 ^pothesis which w(? wish to test, that 6 has a certain value, 
we can estimate the probability that a sample of size s, taken at random 
from the population, will have a value of p differing from 6 by aity given 
amount. If the sample is largo enough, the normal approximation to the 
binomial will enable us to estimate this probability quite easily. Thus, 
suppose wo want to test the hypothesis that the prolialiility of a male birth 
in Alberta is 0.5 exactly. We examine the registration statistics of the last 
200 births (excluding stillbirths) in the province and find that there were, 
say, 110 males. The (piestion arises whether this discrepancy from the 
expected value of 100 is large enough to make us doubt the theory that d = 0.5. 

To answer this (luostion we calculate the probability that, if 6 is really 0.5, 
a random sample of 200 })irths would give a value of p at least as different 
from 0.5 as the value of 0.55 which we actually found, or in other words, that 
the number j of males in the 200 births would l.e a.t least a.s great as 110 or at 
least as small as 90. Now the probabilitv that x == 111) or more is approxi- 
matel}" the probabilit}" that 


109. 5 - 100 


9.5 V2 
10 


1.3435 


This probability is 0.0890, and tliere is an equal probability that x = 90 or 
less. The probabilitv of a (levi<ation from expectation at least as great as 
that found in the sample, on the hvpothesis that the true value of 6 is 0.5, is 
therefore about 0 IS or a little more than which is hardly small enough 
for us to feel safe in rejecting the In'pothesis that 6 is really 0.5. Wo sa\’' in 
such a case that the deviation from expectation is non-mjnificant. Wlioro 
we draw the line is a matb^r of convention. 

Statisticians have more or less agreed to regard a deviation from expecta- 
tion as significant if the probability of obtaining liy pure chance a deviation 
at least as great as this is less than 0.05, and to regard it as highly significant 
if the probability is less than 0.01. To some extent, of course, the limits of 
significance will dopoiid on the seriousness of making a mistake. If it is 
going to be a costl}^ matter to reject wrongly’’ some hypothesis about we 
shall want to be ver^ sure that wo are right, and perhaps we shall want the 
probability of a chance deviation as great as that ol)served to be less than 
0.005. However, in most of the problems in this book wc shall accept the 
conventional significance levels of 0.05 and 0,01. If the probability calculated 
is less than 0.05, we shall say that the hypothesis regarding 6 is rejected at 
the 0.05 level of significance. If the probability is less than 0.01, we shall say 
that the hypothesis is rejected at the 0 01 level of significance. If the prob- 
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ability is greater than 0.05, we shall say that the hypothesis is not rejected, 
at the 0.05 level. This is the situation in our example. On the basis of the 
sample, we could not reasonably reject the hypothesis that male and female 
births have an equal prol. ability. Of course, new and larger samples might 
very well cause us to revise our opinion. 

11.3 Tests of Hypotheses. One-tailed and Two-Tatled Tests. In the 
example discussed in §11.2, a hypothesis was made about the parameter 
namely, that 0 == 0.5. This is called a simple hypothesis^ because it fixes the 
binomial distribution completely (for a given sanqjle size s). A hypothesis 
which does not completely specify the distribution is said to be composite. 
The proportion p of individuals in a sample having the characteristic under 
discussion (of which the probability in the population was 6) was used to test 
the assumed value of 6. The method adopted was to obtain the probability 
distribution for p and to calculate (at least approximately) the probability 
of getting, in a random sample of size a proportion at least as event from 
$ as the observed p itself. This involved finding the areas under the standard 
normal curve, both for z greater than a certain value Zi and for z less than 
— 2 ,, where Zi was calculated from p. Since these areas correspond to the 
upper and lower ^‘tails^’ of the normal curve, the test is often referred to as 
a two-tailed test. We always use a two-tailed test if we are interested in the 
amount of the deviation from expectation, without caring very much in what 
direction it goes, but situations do arisen sometimes wlieie we have good 
reason to believe, before we perform the experiment or obtain the ohservalians, 
that the deviation, if it occurs, will be in one direction only. If so, we are 
justified in using a one-iatled tost. 

Example 2. The eanie teet is p;iven to 100 subjects twice, and 60 of these get a better 
score on the second attempt than on the first. Is this fact significant*^ 

The hypothesis tested is that the lepetition makes no diOercnce to the scores. Let us 
suppose we have decided in advance tlmt the only possifile eficct of repetition, if there is any 
etlect at all, must be an improvement in the scores. The question then arises as to the signifi* 
cance of the amount by which the observed p exceeth 0 5. I'he equivalent normal value is 
(59.6 — 50)/(25;^ = 1.9, and the area under the normal curve for 2 > 1 9 is 0.029. That 
is to say, the probability of 60 or more out of 100 getting an improved score by chance, if 
actually repetition makes no difference, is about 0 020, and so the result is judged significant 
by the usual entenon. If we used a two-tailed test, the probability that the number getting 
a better score is either 60 or more or 40 or less would l>e 0.058, and this would usually be 
judged non-«ignificant. 

A hypothesis that is t^ested for possible rejection, such as the hypothesis 
that 6 = 0.5 in Example 2, is called a null hypothesis. This is tested 
against an alternative hypothesis, which, in this example, is that 6 > 0.5 (a one- 
sided alternative). On the basis of the observed results, the null hypothesis 
is rejected at the 5% level of significance, and therefore the alternative hypoth- 
esis is accepted at the same level. Notice that the null hypothesis is not 
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disproifed. All we can say is that either the null hypothesis is untrue or that 
a very unusual event has occurred, namely, an event \vith a probability of 
less than 1 in 30. We shall usually prefer to make the former statement, 
but if we do so we stand a chance, oven though a small one, of being wrong. 

11.4 Confidence Limits for the Binomial Parameter. Instead of using 
the sample to test an assumed value of 6, we can regard it as giving an estzmcUe 
of an otherwise unknown 6. Since the expectation of p is ^ itself, p provides 
an unbiased estimate. We know, however, that p will vary from one sample 
to another, even though the sample size remains constant, and therefore our 
estimate is subject to sampling error. In fact, the sampling' variance of 
p is ^(1 — B)/s. For any observed value of p w^e can sot limits which will be 
wide enough to include the true value of B with any required degree of con- 
fidence, say, 0.95. These limits are called the 95% confidence limits for B, 
for a reason that will now' be explained. 

The true value B, although unknowm, is not a random variable, so that we 
should not speak of the probability that B lies between the confidence limits. 
Rather it is p w'hich is the random variable, and there is a probability that 
the confidence iniei'vol (l)etw'een the up}:ier and low'er confidence limits), which 
is a function of p, will include the true value B. If we take many samples and 
calculate many 95% confidence intervals, then about 95% of them will include 
the true value, so that in saying of one confidence interval that it does include 
the true value we stand only a 5% chance of l>eing wTong, and are therefore 
reasonably confident of Ixiing riglit. 

We now consider how to calculate these limits, and once again wo appeal 
to the normal approximation to the binomial distribution. We know that 
z = (,vp — .s^)/[.s*0(l — approximately A^(0, 1), and that the two 

symmetrical values of z wiiich belw'oen them include 95% of the whole area 
of the normal distribution are ±1 96. (The area of the tail beyond 2 = 1.96 
is 0.025, and tliere is an equal area in the other tail below —1.96.) There 
is therefore a probabilit}^ of 0.95 that, if B is the true value of the parameter, 

This inequality can be written as 

(sp - sey < (i. 9 G)=s 0 (i - e) 

or 

(11.3) s(p - ey < 3.84 «>(i - e) 

Collecting together the terms in 6 and 9 ^, we obtain 

(11.4) tf»(8 + 3.84) - ei2ps + 3.84) + «p* < 0 
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Now the quadratic expression + W 4- c is negative, for a > 0, when 
6 lies between the liniits ( — b ± b'^ — 4ar)l2(i, which are ilio roots of the 

equation obtained by replacing the inequality 1 ) 3 ' an equalit}’. 'I'he upper 
and lower confidence liniits are therefore given by tlie two roots of the quad- 
ratic (11.4) with the left-hand side equal to 0. One root corrosjionds to the 
right-hand inequality in (11.2) and the other to the left-hand irie<iuality. 
A general expression can bo written for these roots, but the proceed ure is best 
illustrated b}" a numerical example. 


Example 3. - In a sample of 200 jiersons interviewed in a public opinion poll, 124 said 
“yes” to a certain question and the remainder said “no ” (For simplicity we are ignoring 
those persons who were undecided.) W hat are the 95 confidence limits for the pioportion 
of persons in the population sampled (supposed very largo 'i wdio would answer “ycs“‘^ 

Here p — 0.62, s = 200, a’nd the quadratic equation for 0 is 208 S4^ — 251.84^ 4 
76.88 = 0. The roots are 6 - 0.551 and 0 684, w hich piovide the required confidence limits. 

When s is fairly large, as in this example, it is unnecessary to coriect the observed x by 
adding or subtracting ^ in forming the normal approximation. Jf greater accuracy is 
desired, however, we must replace the inequalities in (11 2) by the following pair: 


(11.5) 


sp - i - se ^ J 

[sea -e ) 1 ‘^ “ 

_i 

[s0(i -e 


The first gives e> 0.649 and the second e^_ 0.()87, so that the confidence limits are only 
slightly affected. 

If we want 90% confidence limits, the number 1.96 in (11.2) or (1 1.5) must be replaced by 
1.645, and if we want 99^.)^ confidence limits by 2.570 The values for other confidence 
limits can be obtained from tables of the normal law', but these arc the ones usually adopted. 



Fig, 42 


11.6 Confidence Interval Charts. 

A convenient set of charts for reading 
off the two values of B corresponding 
to given values of p and .s has been 
devised ])y Clopper and Pearson (Ref- 
erence 1). For each value of s two 
curves connecting p and B arc drawn, 
as in Fig. 42. The ordinates at any 
given p are the corresponding upper 
and lower confidence limits. For the 
larger values of 5 the curves are drawn 
from equation (11.2), but for small 
values of s the normal approximation 
Is not adequate and the binomial 
function itself must be used. Re- 
cently published tables of the binom- 


ial distribution (see Reference 2) make it easier to obtain the values of B 
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for which 

» 

£ C{a, x)e*(l - «)•-* = 0.025 

X •mtp 

and for which 

»p 

2 C{S, x)e*{l - ff)*-* = 0.025, 

Xm,Q 

these values being respectively the lower and upper confidence limits corre- 
sponding to the observed p. Thus if s = 10 and p = 0,6, we find from the 
tables that the limits are 0.187 and 0.813. The values given by (11.5) are 
0.201 and 0.798, which differ appreciably from the true values. 

Clopper and Pearson's chart for 95% confidence limits is reproduced as 
Chart I in the Appendix. Notice that the smaller the sample size, the wider 
the confidence limits. The 90% confidence limits are narrower than the 
95% limits, because the more precise the statement about 6, the more chance 
there is of being wrong in making it. 

11.6 Mean and Variance of a Linear Combination of Independent Variates. 

In the next section we shall consider the significance of a difference in the 
observed proportions between two independent samples, and we shall need 
Theorem 1, which follows. A straightforw^ard, but rather long, elementary 
proof can be given, but as the result is the important thing here, we refer 
the reader to a more sophisticated proof in Part Two, §4.11. 

Theorem 1. If Xi and Xi are independent variates with means pi and pt and 
variances and and if L ^ axi + Cix^ is a linear combination of Xi and 
Xi, then the mean of L is = Cip\ + c^p^ and the variance of L is 

~ C^<T^ + C^(T>?. 

For example suppose Cx == 1 and Ct = —1, then L = Xi — and wo have 
the result that 

( 11 . 6 ) 

11.7. Significance of a Difference Between Two Sample Proportions. 

Suppose we have two independent random samples which may be of different 
sizes 5 i and s%, and suppose in each we observe the number of indhiduals, say 
Xi and Xi respectively, possessing a certain characteristic. Then the sample 
proportions are pi Xi/si and pa = x%/s%, and these will, in general, differ. 
We can test the null hypothesis that in fact both samples came from the 
same population in which the true proportion is B. 

On the null hypothesis, the expectations of pi and pa are both equal to B, 
so that the expectation of pi - pa is zero. The variance of pi is ^(1 — B)/8i 
and that of p% is ^(1 — B)/st] since pi and pa are independent variates, the 
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varianoe of pi — pt is the sum of the respective variances, by the theorem in 
§11.6. Therefore, for large values of Si and s», we can treat 

as a standard normal variate, and estimate the probability of a value of z at 
least as great as that observed, for an assumed value of 6. 

Example 4. In a referendum submitted to the student body at a university, 860 men 
and 666 women voted. 530 of the men and 304 of the women voted “yes.” Does this 
indicate a significant difference of opinion on the matter, at the 1 % level, between men and 
women students? 

The null hypothesis is that the proportion e who would vote “yes” in a very large student 
population is the same for men and women. We do not know the value of B and can only 
estimate it from the combined results. The over-aU proportion of students voting “yes” 
is 834/1416 = 0.589, and this may be taken as an estimate of B. Also, we know that 
px « 630/850 «= 0.6235, rh « 304/566 = 0.5371, so that 

. - 0.08M/ [o 689 X 0 41. + ^)]“. D 24 

Since any value of z greater than 2.576 indicates significance at the 1 % level, the question 
is answered in the affirmative; in other words, the null hypothesis is rejected at this level. 

11.8 Confidence Limits for the Difference in Proportions. If we do not 

wish to make the null hypothesis that the two samples are from the same 
population, we can assume that they come from t'wo different populations 
with probabilities di and dt, and obtain confidence limits for the difference 
— Of, The variable 

(11.7) 2 = [p. - p. - (0, - 6>01/[»i(l - 0i)/s. + e,{\ - 

is approximately normally distributed, for large Sx and Sz, so that we can 
obtain 96% confidence limits by putting z = ±1.96. Unfortunately, Bx and 
Bt are imknown, but we can approximate them, in the denominator* of (11.7), 
by putting pi for Bx and pj for Bt, We havo then the equation 

(11.8) Pi - Pi — (Bx - Bt) = ±1.96 [pi(l - p,)/si + p2(l - Pi)/8i]^ 

for determining the two values of Bx — Bt which are the lower and upper 
confidence limits. Thus, in Example 4, the equation is 

Bi-Bt-= 0.0864 ± 1,96 [0.000439 + 0.000276]^ 

- 0.0864 ± 0.0524 
« 0.034 or 0.139 

The 95% confidence limits are therefore 0.034 and 0.139, and since these 

• Since we want the difference between Bx — Bt and px — p», we obviously cannot make the 
approximation in the numerator as well The denominator does not contain this difference. 
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limits do not include zero, we can say that there is a significant difference at 
the 5% level between the opinions of men and women students on the subject 
of the referendum. The 99% Umits are 0.018 and 0.155, and still do not 
include 0. 

11.9 Binomial Probability Paper. A special graph paper, designed by 
Mosteller and Tukey (see Reference 3) enables problems on the significance 
of proportions to be solved approximately with great ease. A specimen of 
the paper is shown in Fig. 43. The scales are ''square root'^ scales, that is 



Fig. 43. Binomial Peobabilitt Gbaph Papbb 


to say, the distance of points marked x and y from the on^ is 

to x«^nd to y^ respectively. A quarter-circle « 

marked 100 on the axis, and on this circle x + 2/ = 100- ^ 

through the origin passes through points for which y/x ^d is 

called a split. Thus a 50-50 split is a line passing through the pomt o 

quarter-circle of coordinates 50, 50. 
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Suppose in a sample of 10 we find 7 with a certain characteristic, which we 
will call ^^success.'^ Since there are 3 in the sample which are ^‘failures, we 
say that the paired count for the sample is (7, 3). This is plotted on the 
graph paper as a right-angled triangle, with the right angle at (7, 3) and 
sides each one unit long, parallel to the axes (see Fig. 43). When one of the 
coordinates is larger than about 100, the one unit length is scarcely more 
than the widtli of a pencil line, and the triangle becomes a short segment. 
If both coordinates are large, the triangle reduces to a point. 

In order to test whether the observed proportion of successes is 
significantly different from a hypothetical ^ of i, we measure the perpendicular 
distance from the plotted triangle to the 50-50 split. When the numbers 
are small, there are two distances, called the short and the long distance^ 
measured from the two acute angles of the triangle. These distances are 
interpreted by reference to the scale at the top of the paper (marked Full 
Scale). It may be noted that the distance 0 to 1 on this scale is almost exactly 
5 mm. The long and short distances give each a significance level, and the 
observed result is significant at some level between. When both coordinates 
are large there is only one significance level. The scale is a nonnal probability 
one, giving values of z corresponding to the observed deviation. Thus, 2 
units on this scale correspond very nearly to the 5% level of significance (for 
a two-tailed test). In the illustration above, the long and short distances 
are 1.0 and 1.6, so that the result is not significant at the 5% level. 

In Example 3 of §ll.4, the paired count is (124, 76). If this is plotted 
(as P in Fig. 43), and two splits are drawn at a perpendicular distance of 2 
units from the triangle (practically a point), the coordinates of these splits 
provide upper and lower 95% confidence limits for B. In the diagram these 
splits are (55, 45) and (69, 31) so that the upper and lower confidence limits 
are 0.55 and 0.69, agreeing quite well with the results calculated earlier. 

This type of graph paper has many other uses which are explained and 
illustrated in the paper quoted as Reference 3. 

11.10 Sampling from a Finite Population. In all our illustrations in this 
chapter, we have supposed that the parent population which is sampled may 
be taken as infinite, or practically so. However, examples sometimes occur 
of sampling from a finite population, of a size comparable with the sample, 
and then the formulas for testing significance must be modified. 

It is proved in Part Two that if a sample of size s is drawn from a binomial 
population of size n, the variance of a; (the number of ^^successes^' in the 
sample) is equal to what it would be if the population were infinite, multiplied 
by the factor (n — s)/(r 2 — 1). The distribution of x in this case is called 
"hypergeometric.^^ The mean of x is the same as in the pure binomial dis- 
tribution. The only change we have to make in our formulas is therefore to 
replace sd(l — B) by sB{l — 0)(n — s)/{n — 1). 
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Example 5. A telephone poll was taken of University of Alberta students before a student 
election. Of 78 students contacted, 33 said they would vote for Mr. S. The population 
(of students with telephones) may be taken as 2200. Obtain 95% confidence liinits for the 
proportion of voters in the population in favor of Mr. S. 

Here s *» 78, n *» 2200, p 33/78 «= 0.423. Instead of equation (11.2) we now have 


(1U9) 

or 


-1.96 


~ [«9(1 — 9)]^ \ n — s) ~ 


1.66 


(11.10) «(p - «)» < 3.84 ^ 9(1 - 9) 

w — 1 


Then 78(0.423 - e)^ < 3.84 X ^(1 - 9) 

~ 2199 ^ 

whence the limits of 6 are given by 

81.7^ - 69.7^ + 14.0 - 0 


The roots are 0.32 and 0.53, which are the required confidence limits. 


Exercises 

1. If 500 coins are tossed, what is the probability that (a) the number of heads will differ 
from 250 by less than 20 (6) at least 260 coins will show heads? 

2 . In a certain game, callexi “Twenty-six," played wOh ten dice, a player chooses some 

number, such as 4, and undertakes to throw' 26 or more fours in thirteen throws with the 
ten dice. If he succeeds the bank pays him 4 to 1. How much shovld the bank pay to 
make the game fair? A ns. 4.43 to 1. 

Hint Find the normal approximation to the probability of 26 or more fours, with 
130 throws of a die (equivalent to 13 throw's with 10 dice). If the player stakes $1 and the 
bank and if the probability of success is p, then for a fair game pM — (1 — p) • 1 * 0. 

3 . Out of a very large Orade 12 population of high school students, 40% fail a university 
entrance examination. In a certain class of 50 the number of failures w as 14. Is this devia- 
tion from the general average large enough to warrant a presumption that the class is not a 
random sample from the Grade 12 population? 

4 . Two groups of mice, one of 50 and the other of 60, are comparable in respect of age, 
weight, general condition, etc., and have both been given injections of a virus. The first 
group, however, has also been given a certain drug. After three days, the number of 
deaths in the first group w'as 12 and in the second 19. Is the difference in mortality rates 
significant? 

6. 400 eggs are taken at random from a large consignment and 60 arc found to be bad. 
What are the 99% confidence limits for the percentage of bad eggs in the whole consignment? 

6. For a certain year in Canada the deaths recorded for married men and single men 
respectively in the age group 25 to 44 were 3471 and 2307. If the whole population in this 
age group numbered 377,000 single men and 940,000 married men, could the single men be 
reasonably regarded as a random sample of the whole male population, as far as mortality 
rate is concerned? 

7. In a large city 400 voters were chosen at random and asked whether they would vote 
for Candidate A at the next election; 280 said they would. What are the 99% confidence 
limits for the proportion of voters in the city who intend to vote for A? 

8. A physician treats 20 patients suffering from a certain disease and 11 of them die. The 
mortality rate in this disease, based on thousands of reported cases, is 42%. Can the sample 
be regarded as exceptional? 
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9. {Mainland) In testing the efficacy of a chug said to prevent seasickness, 25 men who 
always developed symptoms of sickness vhen subjected to the motion of a rocking machine 
were given the drug Again tested, 15 were now found to he immune to the motion What 
are the 95% confidence limits for the proportion of men liable to seasickness who would be 
rendered immune by taking this drug? 

10 . In examining the pedigiees of a number of families the occurrence of a particular 
hereditary defect is observed in 14 out of 25 childien If a re? tain genetic mechanism is at 
work, the proportion of these children affected should be 1/4 Are the data consistent with 
this mechanism ^ 

11 . (Cres/n, quoted by In a poll of 148 men and 152 women the question was 

asked, *'Do you approve of the piactice of tipping, by and large*'*” and 89 of the men and 
116 of the women answered ^‘yes ’* Constiuct 95^( confidence limits for the difference 
betw^een the proportion of “yeses” among the male population sampled and the proportion 
among the women sampled (Assume that tliese {Kipulations are very large compared with 
the sample sizes ) 

12 . Random samples of 50 students each from the fresliman c lusb m Arts and Science and 
the freshman class m Engineering are given a mathematical aptitude test Tlie numbers 
reaching a pass standard are d5 and 41, respectively A'^suming that the wliole Aits and 
Science class includes 248 students and the Engineeiing class 1S7, test at the 5% level the 
null hypothesis that the pioportion of successes is the same in both classes 
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CHAPTER XII 

SIGNIFICANCE OF MEANS AND VARUNCES 


12.1 Distribution of the Sample Mean. It is proved in Part Two that^ if 
a great nuinl^er of random samples of size N are picked from an infinitely 
large population, and if the arithmetic mean x is computed for each sample, 
then X has a probability distribution for which the moments can be calculated 
in terms of the moments of the parent population. If we denote by ^x, (rx'\ 
yiyi) otc,, the mean, variance, skewness, etc., for the dustrtbutcon of the mean, 
and if we let 7i, ^"tc., denote tlie corresponding quantities for the parent 
population, then we can show that 


(12.1) 

MJ = M 

(12.2) 

ajJ = a^/N 

(12.3) 

Ti,i = 

(12.4) 

72,1 = 72 / 


The mean of all the means is therefore equal to the mean of the parent 
population, but the standard deviation and the skewness are equal to the 
corresponding population values divided by while the kurtosis is the 

population value divided by N. It follows that if N is fairly large and if 71 
and 72 are moderate, the skewness and kurtosis of the distribution of means 
will be near to zero, and it appears that the distribution of means will some- 
what resemble a normal distribution. If it happens that the parent popula- 
tion is itself normally distributed, then 71 = 72 == 0; so that the skewness 
and kurtosis of the distribution of means are also zero, and in fact the distribu- 
tion of means is then itself exactly normal with mean and variance (t^/N . 

The results expressed by equations (12.1) to (12.4) are of great importance 
in estimating the significance of differences in the means between two samples 
and in finding ccmfidence limits for the mean of the parent population from 
the mean of a sample. For proofs, the reader may refer to Part Two, §§4.1b 
and 6.8. Proofs involving only elementary algebra can be given, but are 
likely to be rather long and tedious. We give, to illustrate the type, the 
proof of (12.1) for a discrete distribution and for a sample of only two 
individuals. 

Let us suppose that in the population the variate x can take the distinct 
values Xit Xk, with probabilities Pi, P2,‘ For the sample, two 
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items are picked at random, say Xa and xg (these may, of course, be equal) 
and the sample mean is 

(12.5) J + xg)/2 

Since the two items are independent, the probability of picking just these 
two is the product of the probabilities for the two separately, namely, TPaVp- 
The mean m of the values of x for all possible samples of two is given by 
multiplying x by the probability of the sample and summing over all possible 
values of a and jS, from 1 to k. That is, 

Mi = 'HxpaPg 

aji 

(12.6) = 

Now == YlVfi = 1? since each of these is simply the sum of the prob- 

abilities for all possible values of x. Also, by definition of the population 
mean, 

(12.7) p = Y.Xc.Pa = Jlxgpg 

a ^ 

the two sums being the same since a and both have the same domain of 
values 1 to k. Therefore, from (12.6), 

Mx == 

0 a 

(12.8) = iM (EVa + Ep0) = M 

a 0 

which is the same as (12.1). 

The proof for the variance is similar, using the definition 


(12.9) 

<7*’ = E^VaPB — Mi* 

and 


(12.10) 

<7* = X2:«*Pa - = E^b'‘P0 - M* 


a 0 


but we will not give it in detail. It turns out that <72® = ^cr®, which is the 
same as (12,2) for the special case iV « 2. The extension of these proofs to 
three, four, or more items in the sample should be obvious. 

12.2 An Illustration of the Distribution of Means. An experiment in 
sampling was carried out in class as follows: A “population^' was constructed 
by writing numbers from 0 to 24 on 1000 metal-edged cardboard tags. The 
distribution of the numlx>rs was as given in Table 38, and it is clear that this 
distribution is markedly skew. 
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TaBXJB 38. pABXNT POPtHLATION IN SAMPLING ExPEBIMENT 


X 

/ 

X 

/ 

0 

1 

13 

32 

1 

23 

14 

26 

2 

61 

16 

20 

3 

92 

16 

16 

4 

106 

17 

13 

6 

100 

18 

10 

6 

94 

19 

8 

7 

87 

20 

6 

8 

78 

21 

4 

9 

69 

22 

3 

10 

59 

23 

2 

11 

49 

24 

1 

12 

40 


— 


The numbered discs were put into a goldfish bowl and well mixed. A 
sample of 10 discs was withdrawn, and the numbers were noted before the 
discs were replaced. The process was then repeated until a few hundred 
samples had been obtained, and a frequency distribution of the means was 
constructed. Over a period of several years, the data summarized in Table 39 
were collected. 

For the population of Table 38, the following values may be calculated: 

M = 7.601 
= 19.57 
Ti = 0.896 
= 0.508 


( 12 . 11 ) 


For the distribution of Table 39, we find (using Sheppard’s corrections) 


( 12 . 12 ) 


Mi * 7.64 
<rj* = 1.985 
71, j = 0.343 

7,,j = 0.002 


These are actually the sample statistics, for the sample of 2100 means, but 
this ntunber is so large that the estimates for the whole population of possible 
means will not differ appreciably from the sample statistics. The results in 
(12.12) noay be compared with the expected values. 

It 7.60, «r*/10 « 1.957, yi/\/lO ■= 0.283, and 7»/10 = 0 051 
and aie seen to be of about the right size. At the present stage we cannot 
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Tabus 39. DisTRiBtmoN of Means of 2100 Samples of 10 feom Population 

OF Table 38 


Class Limits 

/ 


u 

fu 

/«• 

3.0- 3.9 

1 

3.45 

~4 

~4 

16 

4 0- 4.9 

27 

4.45 

-3 

-81 

243 

6.0- 5.9 

210 

5.45 

-2 

-420 

840 

6 0- 6.9 

463 

6 45 

~1 

-463 

463 

7 0- 7.9 

569 

7,45 

0 

0 

0 

8.0- 8.9 

477 

8.45 

1 

477 

477 

9.0- 9.9 

234 

9.45 

2 

468 

936 

10.0-10.9 

99 

10 45 

3 

297 

891 

11.0-11.9 

23 

11.45 

4 

92 

368 

12.0-12.9 

6 

12.46 

5 

30 

160 

13.0-13.9 

1 

13 46 

6 

6 

36 


2100 



402 

4420 


go further in judging the agreement. The distributions of Tables 38 and 
39 are graphed in Fig. 44, and this shows clearly the reduction in standard 
deviation and skewness brought about by taking the mean of even so small 
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a sample as 10. The parent population is actually finite, but the correction 
for this is not important, since the population is 100 times as large as the 
sample. (See § 12 . 6 .) 

12.3 Significance of Means. If we suppose that the mean £ of a sample 
of iV is normally distributed with mean m and variance o^/N^ we can use the 
tables of the normal law to estimate the probability of a difference at least as 
great as that observed between f and an assumed value of m* 
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Example 1. If a '*tnie” die is rolled 60 times, what is the probability that the average 
number of spots obtained will be at least 4? 

The probability for x spots in a single throw is assumed to be ^ for all values of x. This 
is implied in the statement that the die is true. The population consists of the infinitely 
many throws which are at least conceivable with the given die. For this population, 

6 1 

6 

and 

B 1 

0^ « y^^ XH(x) — aa “ — (3.5)* 

*•1 6 

-i (6. 7. 13) -(3.61.- 2 

(see Theorem 6, §4.2). Therefore if x is the mean of x for a sample of 60» 
tix «= 3.5 

af « 35/(12 50) « 7/120 - 0 05833 
If the distribution of x is normal, the standard normal variate will be 
2 * (i ~ 3.5)/(0.05833)^ - (x - 3 5)/0.2415 
For i « 4, ^ = 0.5/0.2415 « 2.07 

and the probability of a value of z at least as great as 2.07 is 0.0192. This is the probabil* 
ity required. 

Note that from the wording of the question we are concerned only with the upper tail of 
the normal distribution. If we want the probability that the observed mean differs from 
the expected value 3.5 by at least 0.5 either way, the probability obtained above must be 
doubled. 

12.4 Confidence Interval for Means. There are two parameters in the 
equation of the normal law, fz and cr, but if we suppose that a is fixed, we can 
estimate confidence limits for n from 
the mean 5 of a sample. Since ^ is 
normal with mean /x and standard 
deviation we have only to put 

(12.13 z = 

to obtain the 95% confidence limits 
for /x. This equation may be written 

(12.14) M ^ ± 1.96 

and it is clear that the width of the 
confidence interval (between the up- 
per and lower confidence limits) is 
constant for all values of as illus- 
trated in Fig. 46. 

If <r is not known it must be estimated from the sample. For large values 
of iV, a may be taken as the sample standard deviation s. For small N a 
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better estimate is s 



but unless v is known, the equation (12.14) 


should not be used for small values of N, say less than 30 or 40. A better 
method for small samples is given in §12.10. 


Example 2. The mean score for a randomly selected group of 50 students on an aptitude 
test was 493, with a standard deviation of 98. What are the 95% confidence limits for the 
mean score in the large population of students who took the test^ 

Here 2 « 493, e 98. Assuming that <r *= 98 Vso/io * 99, the confidence limits are 
n tt 493 »b 1.96 X 99 /V 50 « 493 ^ 21 y so that when we fix the population mean, on the 
basis of this one sample, as lying between 466 and 520, we make a statement which, as one 
of many such statements, stands a 5% chance of being wrong. 


12.6 Distribution of the Sample Sum. Instead of the mean x we may be 
more interested in the sum of all the x-values in the sample. Since 
the distribution of is of the same shape as that of Uty but with 
a mean and standard deviation N times as great. The mean of is there- 
fore Np and the variance is Na^. Thus, in Example 1 of §12.3, the total num- 
ber of spots in 50 throws of the die will have a mean 50 X 3.5 = 175 and a 
variance 50 X 35/12 = 146. 

As an illustration of the distribution of the sum we shall give an alternative 
and simpler derivation of equations (10.15) and (10.16) for the mean and 
variance of the binomial distribution. 

Consider a variable v which can take only two values 0 and 1, corresponding 
to ^‘failure^' and ^‘success, and suppose the probability of 1 is B, Then the 
probability of 0 is 1 — For the distribution of v we have, therefore, 


(12.15) 


M «= (1 - ^)-0 + (^)-l = ^ 

+ ^2 == (1 ~ 0 ). O 2 + (^).12 « e 


so that ^ e - 0^ ^ - 6), 

Now the sum of the values of r in s trials will be the number of successes 
in B trials, since each success contributes 1 to the value of v and each failure 
contributes 0. 

The distribution of the number of successes x, in the binomial distribution, 
is therefore the same as the distribution of 22?^. By the result stated pre- 
viously, the mean of is s times the mean of v (that is, sd) and the variance 
is B times the variance of v (that is, s0(l — 0)), in agreement with the results 
deduced earlier. The same method can obviously be used to find higher 
moments of the binomial distribution, if these are required. 

12.6 Correction for Finite Population. If a sample of size N is drawn 
from a population of size Af , there is a finite number C (M, N) of different 
samples which can be obtained. For example, if there are 20 numbered 
chips in a bowl and a sample of 5 is drawn, the number of different possible 
samples is 15,504. Each sample will have its own mean I, given by adding 
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N values of x and dividing the sum hy N. Let the mean of all the sample 
means be denoted by fxs. Then the sum of ail the sample means is C (ikf, iV)Mx> 
and the sum of all samples is NC(M, But in adding all the possible 

samples every individual in the population must be included as often as every 
other, and since there are M individuals in the population and the total 
number of individuals in all samples is NC{M,N), the number of times 
that each must be included is NCiM, N)/M, Therefore 


NC(M, - 


NC(M, N) 
M 


•S 


where S is the sum of all the individuals in the population and therefore is 
equal to Mfx. It follows that 

(12.16) Mx = M 

A similar but rather more complicated argument will show that 


(12.17) 


M-N 
A * if ~ 1 


so that the correction to the variance necessitated by the finite size of the 
population is the same as that given for the binomial distribution in §11.10. 

For large values of M compared with N, the factor (if — N )/ (if — 1) is 
practically equal to 1, and then (12.17) reduces to (12.2). Thus, for the 
illustration in §12.2, dealing AVith samples of 10 from 1000 numbered discs, 
(if — iV)/(if — 1) ~ 0.991, and the correction is therefore unimportant. 

Example 3. In a freshman class of 180 students, the mean score on a test was 58, and 
the standard deviation was 12. If a group of 45 of these students is selected at random, 
what is the probability that the mean score for the group will be 00 or more? 

Here we have AT » 180, AT ** 45, /x 58, <r * 12, so that 


MS " 58 



The z value corresponding to 2 = 60 is therefore z « , « 1.287, and the proba- 

1.654 

bility of a value at least as great as this is 0.099. 

12.7 Significance of Difference of Means in Large Samples. Suppose 
we have two samples of sizes Ni and with means and 1 ? 2 . If both sam- 
ples are supposed drawn at random from the same population with mean p 
and variance the difference fy — ^2 will be distributed with mean 0 and 
variance cr^/Ni + This follows from Theorem 1 in §11.6. If both 

A 1 and are large enough for the means and ^2 to be practically normal 
(and for any Ni and Nt if the parent population is itself normal), the differ- 



182 


Means and Variances 


xn 


cnce — 5a will also be normal. That is to say, if 

/I 1 

(12.18) (m + wJ 

2 is a standard normal variate and the significance of a deviation from 0 can 
be assessed in the usual way. If the parent population is finite, of size M, 

o^/Nr IS replaced by - • y and <rVA^* by - • y— y • 

Usually 0 ^ is unknown and has to be estimated. An unbiased estimate of 
is 

(12.19) = [Nist^ + N2S2^]/iNi + N 2 - 2 ) 


where Si and 82 are the standard deviations of the samples. 

For small samples the distribution of z when a is substituted foi <r is no 
longer normal, even for a normal parent population. The correct procedure 
is described in §12.11. 

12.8 Student’s f-distribution. We have seen that, for samples from a 


normal parent population, the quantity z = 


X — fJL 

<t/N^ 


is a normal standard var- 


iate. In most practical situations cr is unknown. If instead of cr we substi- 
tute the estimate [AV {N — 1)]^^* «, where s is the standard deviation of 
the sample, then, as was first shown by W. S. Gosset, writing under the pen 
name of ^^Student^' in a paper that has now become classic,* the variable 


t 


^ — /i 

8/{N - 1)^ 


has a distribution which can be represented mathematically, 


and for which tables like those of the normal law can be calculated. The 
numerator of t is normally distributed with mean zero, and the denonunator 
is an estimate^ from the sample, of the standard deviation of the numerator. 

The important point about the distribution of i is that it depends only on 
the sample size N and not on the variance cr^ of the population. In fact, the 
probability of a value of t between t and t + dim given by 


(12.20) 


m dt 


= iir(i + 


AT - 1/ 


di 


where K is a certain function of N, The curve of fit)^ plotted against t, is 
a symmetrical, hump-backed curve, not unlike the normal curve in shape but 
with higher tails. The curve for AT = 5 is compared with the normal curve 
of equal variance in Fig. 46. As A increases the curve approaches the normal 
curve more and more closely. Its variance is {N — 1)/(A— 3). 


• See Eeference 1. ‘‘Student’' actually used the variable (S — /*)/«, which is not esswi- 
tially different from U 
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The probability of a value greater than a given value of i is 
(12.21) r /(I) dt ■=^ I - F{t) == P 

J t 

where F{t) is the distribution /' 

function foi t. Table II in the 
Appendix gives values of t cor- 
responding to selected values of 
P for n (= — 1) between 1 

and 30. The relation between 
t and P is illustrated in Fig. 46. 

The probability of a value at 
least as great iiumerually as the 
given t is double the probability 
stated. This double probability 
must be used in making a two- 

" “b * i i. \ 2<46 

tailed test. - — t 

For values of n larger than 30, Fig. 40 Studunt’s < (n = 4) 

the quantity 

(n - 2 \^ 

z = i { ) = (x - M)(n - 2yiys 

may be taken as approximately a standard normal variate 

12.9 Degrees of Freedom. The quantity n used in Tabk^ II (Ajipendix) 
is called the number of degrees of freedom. The meaning oi di^gi-f^es of freedom 
is not easy to explain at an elementary level, but the general idea is that the 
N deviations — x which are used in calculating the sample vaiiance 8^ 
are not all independent, since there is a linear relation connecting them, 
namely, 

Because of this, onl}^ A' — 1 of the quantities are actually independent, and 
we say that there are A ■— 1 degrees of freedom in the calculation of s. An 
unbiased estimate of the population variance is 

NsyiN ~ 1) - i;(x, ~ x)y(N - 1) 

1-1 

fi^nd is given by dividing the sum of squares of deviations from the mean by 
the number of degiees of freedom. For a good elementary discussion of the 
notion of degrees of freedom, the student may consult an article by Prof. 
Helen Walker (Reference 3). See also Part Two, page 162. 

12.10 Confidence Limits for the Mean, for Small Samples. Given a 
sample, of size A, with mean 2 and standard deviation 5, we can use the 
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table of Student’s t to find confidence linxits for the mean ft of the population, 
on the assumption that x is normally distributed in this population. Writing 
ntot^N — 1, we have 

(12.22) t - (x — iJi)n^/s 

For a given n we can find in the table the value of t corresponding to P = 0.026. 
The probability of a value of < numerically greater than this is 0.05, so that 
this value, substituted in (12 22) will give the 95% confidence limits for n. 
Thus, if n = 9, i = 2.262, and w'e have 

X - n 2.262 

« “ V9 

or 

(12.23) A - 5 ± 0.754s 

Example 4 For a sample of 10 rats, the mean blood viscosity reading was 3 93 and the 
standard deviation was 0 552 Stale 95% confidence limits for the blood viscosity reading 
in the population of rats sampled. 

From (12.23), p = 3.93 ^ 0.416 

so that the confidence limits are 3 51 and 4 35. The constant in (12 23) is different for each 
different sample size. For large samples, it approximates 1.96/n^. 

12.11 Confidence Limits for tlie Difference of Means, for Small Samples* 

If we have two samples of sizes Ni and means Xi and J 2 , and standard 
deviations 5i and S 2 , we can form confidence limits for the difference of the 
means pi — in the two populations from which the samples are supposed 
to be taken, on the hypothesis that these two populations have the same 
variance. If the confidence limits include the value zero, w'e can say that 
the means of the two populations are not different, or, in other words, the 
hypothesis that the two samples come from populations with the same 
mean is not rejected at the chosen level. 

If we denote the degrees of freedom for the two samples by ni and n*, we 
can prove that the quantity 

(12.24) {pi- M2)]/?12 

where ay? is an unbiased estimate of the variance of the numerator, is distrib- 
uted as Student’s with % + degrees of freedom. 

On the hypothesis of a common or^, the variance of is a^/Niy and that of 
Xt is cf^/N%y so that the variance of — ^2 (the samples being independent) is 

(^ + + m)/NiNt 

The estimate of cr* is taken as a weighted mean of estimates from the two 
samples separately, weighted according to their respective degrees of freedom. 
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The estimate from the first sample is Nis^/ui and that from the second is 
so that 

(12.25) ^ + N^s,^)/{ni + n,) 

Then is given by 

^ 12 ^ - + N2)/NiN2 

^ N1+N2 

ni + 712 N1N2 


With this value of ^^ 2 ^ t in (12.24) has the Student ^-distribution, and 
confidence limits can be found by putting t equal to the appropriate value 
corresponding to ni + ?i 2 degrees of freedom. 


Example 5. Samples of two types of electric light bulbs were tested for length of life, 
and the following data were recorded: 


Type 1 
iVi = 5 
2i = 1224 hr 
8i « 36 hr 


Type 2 
Nt * 7 
Sa - 1036 hr 
Si = 40 hr 


Is the difference in the means sufficient to warrant the conclusion that Type 1 is superior to 
Type 2 in respect of length of life? 

We assume that the standard deviation of life for the two types is the same This may 
seem somewhat arbitrary, seeing that we do not suppose the means to be the same, but it 
may well happen that an improvement in construction increases the length of life of a lamp 
without much affecting its variabihty. Unless the assumption is approximately true, the 
t-test is not valid. 


Here 


^ (5 X 1296 4- 7 X 1600)/10 « 1768 

1768 X 12 


cru* 


35 


- 606.2 


and 


t « [1224 - 1036 - (mi - M2)]/(606.2)i'» 
» [188 — (m — m2)J/24.6 


The number of degrees of freedom for iis 4 4- 6 = 10. The 95% value of t is 2.228, so that 
the 95% confidence limits are given by 


Ml - M2 = 188 =*= 24.6 X 2.228 = 188 54.8 

and are therefore 133 and 243. Since these limits do not include zero, we can say that at the 
5% level there is a significant difference between the means. If we feel sure t\at Type 1 
cannot be worse than Type 2, but may be better, we shall use a one-tailed test, and the 
96% value of t will be 1.812. The confidence limits for mi ~ M 2 are then 
188 =*= 24.6 X 1.812, or 143 to 233 

If we want 99% confidence limits, then, for the two-tailed test, t = *^3.169, and the 
confidence limits are 

Ml •— Ms 188 **= 78 

or 110 to 266. These limits still do not include zero, so that even at the 1% level the ob- 
served difference is significant. 
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12.12 Significance of Differences in Paired Samples. In some problems 
we are concerned with tlie effect of two different procedures or treatments 
carried out on the same san^pk of individuals. Instead of trying out one 
treatment on one random sample and the other treatment on an independent 
random sample, wc try out both treatments on each member of the same 
sample. In this way we render the test more precise, because wo eliminate 
a number of sources of variation which might possibly afh'ct the quantity 
w'e are measuring. Wo ask now whether the mean ol the individual differ- 
ences betwe(*n the .r score on one treatment and that on the other treatment 
is significaiitly different from zero. 

The null hypothesis is that there is no difffTencc, in the population as a 
whole, l)otweon the x s(‘ores on the tw^o treatments. We form a sot of differ- 
ences dt — X2i “* Tn between the scores of th(‘ ?th individual in the sample 
on the two treatments and assume that these differences are normally distrib- 
uted in the population with mean zero. We then can u^e the f-test to find 
whether tlie observed mean of the d^ is significantly different from zero. 

Example (3. 7>ible‘ 40 (Swith and MedlicotP shows the hoinoglohin (giii/100 ml of blood) 
in aneiiiic nils before and alter 4 weeks of added iron in tlie diel (0 h wg ^day ) The mean 
value of d is 0.S25 and the standard deviation is 1 70 The number of degrees of freedom is 
11 (one less than the number in the sample), so that, by (12 22), 
t - (0.825 - OiUb^/l 70 - 1 t)l 

The probability of a value of t at least as great mimeneally as this, is betw^een 0 1 and 0.2. 
The one-sided probabilitj’ is })e1ween 0 05 and 0 1, and it would be leasoiiable to use a one- 
sided test on the i)rirR'i])le that any diffeicnce in hemogJobin due to inereasod iron m the diet 
could only be an increase Ilowevei, even on this mlcipretation, tlie observed effect is 
non-significant. 


T'abll 40. Efject of Iron in Diet on IIemoglohin 


Rat No 

Xi ihejorc) 

0*2 (after) 

d = J2 - Ji 

1 

3 4 

4 9 

1 5 

2 

3.0 

2 3 

-0 7 

3 

3.0 

3.1 

0.1 

4 

3.4 

2 1 

-1 3 

5 

3.7 

2.6 

-1.1 

6 

4.0 

3 8 

-0.2 

7 

2.9 

5.8 

2.9 

8 

2.9 

7.9 

5,0 

9 

3 1 

3.6 

0.5 

10 

2.8 

4.1 

1.3 

11 

2 8 

3 8 

3 0 

12 

2 4 

3 3 

0 9 


12.13 Distribution of the Sample Variance. It is proved in Part Two 
that the mean of the distribution of in samples of size N (from a parent 
population which is not necessarily normal) is equal to (N — 1)(t^/N, where 
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is the variance of the population. This fact has already been used in form- 
ing an estinoate of from an observed s*. It is also proved there that the 

iV — 1 

variance of the distribution of is — [(A^ — 1)m4 — {N — 3)iU2^], where 

and tu are the second and fourth moments for the population. If the 
distribution in the parent population is normal, m 4 = 3m/, and the variance of 
then is given by 

(12.26) Var (s^) = 2a^{N ~ l)/N^ 

Usually would be estimated by Ns^/{N — 1) and the estimate of Var (s*) 
would then be 


(12.27) Var (s^) = 2s*/n 

where n = iV — 1. The square root of the estimated variance of a quantity 
is usually called its standard error* 

The distribution of is skew. It can be shown that when the parent 
population is normal the quantity Ns^/a-'^ is distributed like a variate called 
(Greek chi square) which is of great importance in a number of statistical 
problems. (The symbol x^ rather than x is used because it is a quantity 
which can never bo negative.) The graphs of x^ for two values of n are 
shown in Fig 47. The area of the righthand tail of the curve beyond a given 
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X* 


Fig. 47 . Distribution of x* 


• The term ^^probable error*' occurs in the older literature for the standard error multi- 
plied by 0.6745. If the distribution is normal, or nearly so, the probability is 0.5 that the 
Fariate lies within limits equal to the mean the probable error (Cf. §8.5). 
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value of X* is the probability P of a value of x* greater than the given value. 
Table III in the Appendix gives for selected values of P the corresponding 
values of x^ and from these the significance of an observed value can be 
judged. The table of enablcvS us also to set confidence limits for o-* from 
the observed value of in a sample. Thus for 90% confidence limits, we 
find from the table the two values of x^ which correspond to P = 0.95 and 
P == 0.05. For iV =« 10 (and therefore n = 9) the two values of x® are 3.325 
and 16.919. The confidence limits for are given by putting equal 

to these two values, and so are and 10.sVl6'919, that is, 3.01s* 

and 0.59s2, 

12.14 Significance of a Ratio of Two Variances. The F-distribution. 

Suppose we have two independent random samples of sizes Ni and and 
variances Si* and .sj*. We may wish to tet whether the null hypothesis, that 
is, that the two samples come from populations with the same variance a*, is 
justified. We make this assumption, for instance, in testing the significance 
of the difference of the means, and it would be satisfactory, tefore going 
ahead with the f-test, to be assured that the assumption is a reasonable one. 

It turns out that instead of using the difference of the two variances it is 
more convenient to use their ratio. We find that the quantity 



which is a ratio of the estimate of from the first sample to the estimate of 
0 - 2 * from the second, has a distribution w'hich, on the hypothesis that the two 
samples are from the same population with variance cr^, is independent of 
and depends only on the tv/o numliers nj and rh. As would be expected, F 
fluctuates around the value 1; the mean of the distribution is — 2), for 

n 2 > 2. Tables of the area under the curve of /(F) teyorid a given value of 
F enable us to find the probability P that F will exceed the observed value, 
and hence to judge the significance of an observed ratio of to 8^. Such 
tables are given in Table IV of the Appendix. Since a complete table, giving 
P for all reasonable values of F, rii and r? 2 , would be very bulky, the table 
gives only the values of F corresponding to two selected values of P, namely, 
0.05 and 0.01. These are called the 5% points and the 1% points, reepeo* 
tively. 

The null hypothesis is that the ratio of to 0 - 2 * is equal to 1. If we are 
going to reject this hypothesis both when F is too great and when F is too 
small, the 5% point really corresponds to a 10% level of significance, since 
we are using a two-tailed test, and each tail has an area of 5%. If, on the 
other hand, the alternative hypothesis is that > 1, we use a one-tailed 

test and then the 5% point corresponds to a 5% level of significance. This 
is the usual situation in problems of Analysis of Variance, which provide the 
main occasions for using Table IV. 
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Example In twc samples, of sizes 6 and 11, the observed standard deviations of a 
variable x are 3.6 and 2.0. Is this difference significant? 


Here 

with ni * 5, nj « 10. 


6(3.6)« 10 

5 *11(2.0)* 


3.63 


From the table we see that the 5% point is 3 33 and the 1 % point is 6.64. The difference 
is therefore significant at a level a little below 10%, if we have no reason to believe that 
either population has the greater variance. If we feel sure that, provided there is any 
difference, the population from which sample 1 is taken will have the larger variance, the 
observed difference is significant at a level below 6%. 

Table IV gives the F values at the stated levels for the upper tail of the distribution. 
Values for the lower tail can be obtained, if desired, by taking the reciprocal of F, at the 
same time interchanging the degrees of freedom r?i and th. Thus the upper 6% point for 
fii ■* 6, na « 10, 18 3.33, and the lower 5% point is 1/4.74 = 0.211. Here 4.74 is the 
tabulated point for rii ■■ 10 and na « 6. However, in practice w'e seldom need the lower 
tail, as we can always arrange the ratio F so that the estimate in the numerator is greater 
than that in the denominator. 


12.16 Test for Homogeneity of Variance. If we have several samples we 
can test the null hypothesis that they all come from populations with the 
same variance. Consider, for example, the data in Table 41, on six samples, 
each of five items, where the variable is the breaking strength in lb wt of a 
specimen of cotton cloth. For each sample an estimate of the population 
variance <r^ can be calculated, and these estimates, denoted by are given 
in the last row but one of Table 41. They evidently differ quite considerably. 


Table 41. Breaking Strength op Cotton Cloth (lb wt) 


Sample No. 1 2 3 4 6 6 


38.6 47.4 47 7 44.6 48.1 46.2 

46 6 42.3 44.7 49.8 60.6 42.1 

Items 43.1 46.3 46.4 42.1 41.3 61.6 

47.2 460 46.0 47.3 41.3 60.6 

53 4 48 9 46 4 52 2 48.1 46 4 

45.76 46.18 46 24 47.20 45 88 47.16 

&k* 29.83 6 00 1 15 16.14 18 52 16.17 

log,o<^** 1 475 0 778 0.062 1 208 1.268 1.181 


Now it can be showm that if the null hypothesis is true, and if n* is the 
number of degrees of freedom for the A;th sample, then the quantity M de- 
fined by 

(12.29) M = n log log 

k k 

where n = is independent of <r® and can be used to find the probability 

of obtaining the observed sot of variances if the estimate of from the whole 
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set of samples is fixed, that is, for a fixed value of S/n =» ^.rikh^/n. The 
greater the value of il/, the smaller this probability, and hence the smaller 
the chance of obtaining the heterogeneous collection of variances actually 
observed if the samples form a set of random san'ples from the same popula- 
tion. The logarithms in (12,29) are natural logarithms. If common logs 
are used, M must Ixi multiplied by 2.303. 

Bartlett proved that, if b is the number of samples, and if the n* are reason- 
ably large, M is approximately distributed like ^''ith 6—1 degrees of 
freedom, so that the table of can be used to test the significance of the 
differences between the ak^. For very small samples, down to a size of 4 or 
5, it is better to use M/c instead of M where c is a correction factor given by 

(12.30) c = l + lE(Vn*)-l/«l/3(i>-l) 

If all the Tik are equal, rik = n/6, so that 

(12 31) 1 ^ logioa*^)/6], 

1 c = 1 + (6 + l)/3n 

For the data of Table 41, % = 4, 6 = 6, n = 24, c = 1 + 7/72 = 1.097. 
Also, 

M = 55,26 [logio (86.81/6) - (5.971)/6] - 9.13 

so that M/c = 8.33. For 5 degrees of freedom, the 10% i)oint for the dis- 
tribution of 9.236, so that there is a probability of more than 0.1 of 
obtaining a set of \ariaru‘es as unlike as the at/ on the null hypothesis. The 
null hypothesis can therefore be accepted quite reasonably. 

12.16 Analysis of Variance, We can look at the data of Table 41 in a dif- 
ferent way, asking the question w'hether the sample moans differ among them- 
selves more than would be expected from the way in which the individual items 
in a single sample differ. If we regard the whole set of X](nfc + 1) = n + 6 
items as a single sari’ pie, we can form the mean ^ of all these items and also 
the sum St of all the squared deviations from the n'ean, (zkj — ^c)^, where 
Xkj is the jth item in the A:th sample. St is called the ‘^total sum of squares,^' 
and, on the null hypothesis that all the samples came from populations with 
the same moan and the san].e variance <7^ St/{n + b -- 1) is an estimate of 
or^ with n -f 6 — 1 degrees of freedom. A second estimate of is obtained by 
averaging the separate estimates from the 6 separate samples, the average 
being weighted according to the degrees of freedom n^. This average is the 
quantity ^rikak^/n denoted by S/n in the preceding section. Since 
nkCk^ = ^{xkj ^ik)^ S is defined as If' is called the 

J fo J 

''sum of squares within samples,^' being a measure of the variation within 
each sample around its own mean. Finally, we can fonn a third estimate of 
from the variation between the separate means x*. The variance of the 
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mean of a sample of size n* + 1 is equal to the variance of the sample itself 
divided by rik + 1, so that if == the quantity 

Sb/{b — 1) is an estimate of <r^, Sb is called the ^‘sum of squares between 
samples/^ being a measure of the variation between the means of the different 
samples. The estimates Sb/ (h — 1) and S/n are independent, and if we make 
the further assumption that the variable x is noruially distributed in the 
populations from which the samples are taken, w'o can say that the ratio 

(12.32) f = 

S b — I 

has the F-distribution of §12.14. The tables of F can then be used to deter- 
mine whether the null hypothesis is justified. If the null hypothesis is re- 
jected, because the F value is loo large, the conclusion is that the differences 
between the means are significant. Often the samples have been subjected 
to different treatments, or have been taken at different times, and if so w^e are 
enabled to say whether oi not the treatments or the lapse of tiri'o has produced 
a definite effect on the variate x. The three e8tin'.ates of or- can be set out in 
a table (see Table 42) giving the respective sun s of squares and the corre- 
sponding degree.s of freedom. Since the total variance is analyzed into a 
part due to variation between the samples and a pait due to variation within 
the samples, this process is known as Analysis of Variance. It is a very 
powerful tool in statistical investigation, and can be used in much more 
complicated situations than the one w^e have envisaged here. For further 
details the student may consult the chapter on the subject in Part Two, or 
the textbook b}^ Snodocor (Reference 34 of §0.4). 

It can be shown mathematically that St is the sum of S and Sb. Also the 
number of degrees of freedom for St is clearly the sum of the degrees of free- 
dom for the other two. For the purpose of calculation it is preferable to 
put the expressions for the sums of squares in the form: 

' s, = - clxi„y/n 

Jk }k 

(12.33) -Si = r (i:xt,)ViV* - (E^kiY/N 

* i jk 

S ^ St- Sb 

where N = n + b = total number in all the samples, and Nk is the number 
in the kth sample. 

As applied to the data of Table 41, we find the values for these sums of 
squares given in Table 43, and it is clear that there is no significant difference 
between the means of the separate samples. In fact, those means are much 
closer together than w should expect, even if they were random samples from 
the same population. If it had turned out that the ‘‘between^' estimate of 
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Table 42. Analysis of Vakiancb 


Variation 

Sum of Squares of 
Devtattons 

Degrees of 
Freedom 

Estimate of 

Within Samples 

s = — 2 *)* 

>* 

n 

S/n 

Between Samples 

-Si. « ;^(n* + !)(*» - *)« 

b - 1 

S,/(.b - 1) 

Total 

S, - LCi*, - *)• 

ik 

n -f 6 — 1 

«./(n + 6-1) 


Table 43. Analysis of Vabiance of Data in Table 41 


Variation 

S%im of Squares 

Degrees of Freedom 

Estimate of o* 

Within Samples 

347.25 

24 

14.5 

Between Samples 

9.86 

5 

1.97 

Total 

357.11 

29 

12.3 


was about 2.6 times as great as the ‘^within^^ estimate, we could have said 
that there is a barely significant difference between sample means at the 5% 
level, the 5% value for ni « 5 and ®= 24 being 2.62. 

12.17 Control Charts. Data like those in Table 41 on the breaking 
strength of cotton cloth are often obtained as a routine in factories where 
products are manufactured to a specification, or where they have to conform 
to buyers^ standards. The manufacturer of cloth expects a certain amount 
of variation in the strength of the product, but he wishes the average strength 
to be maintained and the variation around the average to be kept within 
reasonable bounds. To ensure this, samples are taken regularly and the 
results of measurements on these samples are plotted on control charts, so 
that any unusual features may be readily noted and action taken if necessary 
to bring the process back under control. The usual control charts are for 
the sample mean and the sample range, the latter being used as a quick and 
easy estimate of dispersion. Sometimes sample inspection involves, instead 
of a measurement, merely the decision as to whether a manufactured article 
is defective or not. The proportion of defective articles in the sample can 
be similarly set out in a control chart. 

A typical control chart for means is shown in Fig. 48. The central line 
and the upper and lower lines are placed on the chart after a fairly long 
sequence of readings has shown that the process is behaving in a reasonably 
steady way. The central line is drawn at a point on the vertical scale which 
represents an estimate fi of the population mean based on 30 or 40 samples. 
The other lines are upper and lower limits for z based on the estimated sam- 
pling standard deviation of the mean, as, and so placed that the probability 
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is very small that a value of 5 will, by pure chance, fall outside these limits. 
How small is a matter of choice. It is customary to place the limits at 
A db 3is) and the probability is then only about 0.003, but we could, of course, 
choose 0.01 or 0.05. The reason for making the limits wide is that the manu- 
facturer does not wish to waste time looking for non-existent trouble, and the 
chance that he will do so with the 3^5 limits is only about 3 in 1000. Of 


Ih wt 



course, there is a correspondingly greater chance of failing to recognize trouble 
that really exists. The danger signals in a control chart are (1) plotted 
points falling outside the control limits, (2) several consecutive points lying 
near the limits, although still inside, (3) runs of 7 or more points all above 
or all below the central value, or (4) a well-marked trend upward or down- 
ward in the plotted points. If conditions (3) or (4) are observed it may be 
that new control limits need to be calculated, as the process is now settling 


Table 44. Control Chart Limits for Range 


Sample Size 99% values of R/1^ 3<r values of R/H 



Lower 

Upper 

Louer 

Upper 

4 

0.165 

2.28 

0 

2.282 

5 

0.237 

2.10 

0 

2.114 

6 

0.296 

1.99 

0 

2.004 

7 

0.340 

1.90 

0.076 

1.924 

10 

0.432 

1.76 

0 223 

1.777 

15 

0.524 

1.64 

0.348 

1.652 
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down to a state different from that in which the original control limits were 
calculated. 

In a control chart for the range, the Scr/? limits and also the 99% limits are 
given for a few sample sizes in Table 44, in terms of the mean R obtained 
from the preliminary set of samples. The values aro based on the assumption 
that Ti is the true mean range for all possible samples of the given size selected 
from the population, which is supposed to bo normal. 

For furtlier information on quality control, see Reference 7 of §0.4, or Ref- 
erence 2 of Chapter IV. 


Exercises 

1. A normal population has mean 20 and standard deviation 2. A sample of 6 items from 

this population has a mean 18.2. Can the sample be reasonably regarded as a random 
sample from the population? Ans. Not at the 6% level. 

2. A very large population has a mean 26.54 ft and a variance 28.3 ft*. W hat percent- 

age of random samples of size 135, taken from this population, will have means differing 
from the population mean by more than 1 ft? What percentage of samples of size 200 will 
have means between 26.1 and 20.7 ft? A ns. 2 9%, 54.4%. 

3. A population is known to be normal and to have a standard deviation of 0,104 second. 
A random sample of 12 items has a mean of 12 33 seconds. Calculate 95% confidence limits 
for the population mean. 

4. For the population of Exercise 3, what is the smallest sample size that will ensure, 

with 95% confidence, that the sample mean will not differ from the population mean by 
more than 0.05 second? Ans, 17. 

6. A population is known to have a mean of about 25 bushels/acrc and a coefficient of 
variation of about You wish to find, from a single random sample, 95% confidence 
limits for the population mean. If these limits are not to differ from each other by more 
than 1 bushel /acre how large should the sample he? Ans. 600. 

6. Four different boxes of Eddy's matches, from the same carton, contained 55, 58, 63, 
and 57 matches. ^d)tain 95% confidence limits for the mean number of matches in a box. 

7. A group of 120 freshmen at University A take a certain standard test and obtain a 
mean score 70 with a standard deviation 14. A group of 80 freshmen at University B take 
the same test and obtain a mean score of 75 with standard deviation 12. Test the hypothe- 
sis that the two groups are random samples from the same population (that is, that the 
difference between the mean scores is not significant). 

8 . Twenty lines, each exactly 6 inches long, were drawn. A student estimated by eye 
the center of each line. The distance in inches of each point, so estimated, from the left- 
hand end of the line was measured, with the following results: 2.97, 3.11, 2.97, 3.18, 3.13, 
3.23, 2.98, 3.02, 2.92, 3.13, 3.00, 3.06, 2.98, 3.00, 2.94, 2.98, 3.07, 3 07, 3.03, 3.20. Obtain 
95% confidence limits for the population mean. What is the population in this example? 
Is there any reason to believe that the student was making a systematic error? 

9. (Wilks) An aptitude test was given to two groups of soldiers: (a) a group of 1050 
who had been in the army for some time; (b) a group of 631 new selectees. The means and 
standard deviations of scores for the two groups were as follows: 

S 8 m 

(a) 47.66 6.77 

(b) 46.10 6.79 

Find 95% confidence limits for the difference between the population means. 
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10. Two distinct groups of rats, (a) normal and (b) adrenalectomized, were tested for 
blood viscosity. The sample sizes and the means and standard deviations of viscosity 
readings were: 

AT $ s* 

(a) 11 3 921 0 .527 

(b) 9 4 111 0 601 

Is the difference in the means significant at the .5% level? 

Arts, No. The 95% confidence limits for m — fX 2 are 0.19 ^ 0.56. 

11. {Snedecor) (a) An agronomist interested in the effect of superphosphate on corn 
yield added the fertilizer to a mixture of manure and lime 3 he old and new fertilizers 
were tested on five pairs of adjacent plots, the plots in each pair being as alike as possible 
except that one member was treated with the old fertilizer and one with the new. The plots 
with the superphosphate yielded 20, 6, 4, 3, and 2 bushels per acre more than their parallels. 
Was the value of the superphosphate demonstrated*' 

(b) ^^uppose the increased yields had been 5, 6, 4, 3 and 2 bushels per acre. Would the 
value of the superphosphate have been demonstrated then? Explain the apparent paradox. 

12. A physiological experiment was earned out to test the effect of an injection of secretin 
on the {lerceritagc of reticulocytes m the blood of rabbits J7 rahbit.s were tested before 
and after injection, the mean of the increases was 0.0635, and the standard deviation of 
increases was 0.168. Was the existence of an effect demonstrated*^ 

13. The densities of sulphuric acid in two containers were measured, four determinations 
being made on one and six on the other. The results were: 

(1) 1.842, 1.846, 1.843, 1.843 

(2) 1.848, 1.843, 1.846, 1.847, 1 847, 1.845 

Is the difference significant at the 5% level, 

(a) if there is no reason beforehand to believe that either lot of acid is the denser? 

(b) if we have good reason to suspect that if there is any difference the first will be lighter 

than the second? Ans, (a) no; (b) yes. 

14. The following table gives the strength (Ibw^t/in.^) of concrete made with sand con- 
taining different percentages of eoal. Each sample of concrete was made into four cylinders 
which were tested for strength. 


Sample number Percentage coal x (Ih wt/in^) 


1 

0.00 

1690, 

1580, 

1745, 

1685 

2 

0.05 

1550, 

1445, 

1645, 

1545 

3 

0 10 

1625, 

1450, 

1510, 

1590 

4 

0.50 

1725, 

1550, 

1430, 

1445 

5 

1.00 

1530, 

1545, 

1565, 

1520 


Test the homogeneity of the variances of these sampler. Also test the means of samples 2, 
3, 4, and 5, each compared with the mean of sample 1. 

16. Arrange the data of Exercise 14 as an analysis of variance, and use the F test to 
determine whether the differences between sample means are significant as compared with 
the differences within the samples. 

16. Five samples, each of four seasoned mine-props, were tested for maximum load. 
The means and standard deviations of the maximum load (in units of 1000 lb wt) were 
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Sample No. 



1 

42.0 

8.75 

2 

62.0 

10.44 

3 

65.6 

4.72 

4 

61.8 

8.26 

5 

73.5 

16.68 


For each sample estimate 95% confidence limits for the mean, and do the same for the 
combined sample of 20. Test the homogeneity of the variances. 

17. Two small samples of herring were measured for length (mm), with the following 
results; 

(1) 192, 179, 181, 193, 216, 181, 178 

(2) 173, 194, 194, 187, 168, 186, 176, 191, 191, 178. 185, 160 

Do these samples differ significantly in average length? 

18 . A manufacturer desires to turn out cotton thread, the breaking strength of which is 
to have a mean and standard deviation 6 50 oz and 1.50 oz, respectively. Assuming that 
this standard has been attained, what should now be the 99% and the 3<r control limits for 
the mean of routine samples of 10 pieces of thread? 

19 . Use the method of the auxiliary variable v in §12.5 to find the skewness and kurtosis 
of the binomial distribution. 
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CHAPTER XIII 

NON-PARAMETRIC AND ORDER STATISTICS 

13.1 Non-parametric Statistics. In the problems of estimation we have so 
far encountered, we have assumed the form of the distribution of our variate in 
the parent population (for example, binomial or normal) and have endeavored 
to find best values and confidence limits for one or more parameters of this dis- 
tribution. Sometimes, however, we wish to infer something about the distri- 
bution as a whole and are not concerned with the numerical values of the 
parameters. Problems of this nature are called non-parametric. 

13.2 Goodness of Fit. In Chapters VIII and X we saw that some 
empirical distributions arising out of sampling experiments, and some observed 
frequency distributions, can be fitted more or less closely by theoretical dis- 
tributions of the binomial, Poisson, or normal types. No criterion of the 
goodness of fit was mentioned, however. We shall now discuss a ■widely 
used technique, known as the Chi-square (x^) test^ for judging whether or not 
the fit is satisfactory. 

Consider, for example, the binomial distribution fitted to the result of a 
sampling e^tperiment in Table 35, §10.5. For each value of x from 0 to 10 we 
have in the second column an observed frequency (/o) and in the third column 
a calculated frequency (/c), based on an assumed probability of success (0) 
equal to 3^. The values /«, divided by the total frequency A, represent the 
probabilities (on the hypothesis that 0 == 3^) of exactly 0, 1, 2,* • •, 10 suc- 
cesses in a trial, ‘^success^^ meaning here a red ball in a sample of 10 balls. 
On the basis of these probabilities we can calculate the chance of getting, in 
350 samples, the observed set of frequencies /o, that is, the chance that there 
will be 2 cases with x = 0, 22 with x = 1, and so on. This distribution in 
classes is called a multinomial distribution (the binomial is a special case with 
only two classes). It was proved by Karl Pearson that the quantity 

(13.1) Z(/o~/c)V/c 

where the sum is over the k classes in the distribution, is, for large vplues of 
N, distributed like the quantity described in §12.13, with fc — 1 degrees of 
freedom. For any given set of frequencies fo and a corresponding set of inde- 
pendently calculated frequencies /c, we can therefore find the value of x#* 
and use the table of x^ to test its significance. 

Clearly, the greater the differences between the observed and calculated 
frequencies, the greater will be the value of x»^ so that, generally speaking, 
the larger x«*> the worse the fit. More precisely, the area under the x* curve, 

107 
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beyond the given value of represents the probability of obtaining by 
chance, on the null hypothesis that the distribution in the parent population 
is the assumed theoretical one, a value of x? at least as great as the one actu- 
ally found, or in other words the probability of a fit at least as bad as the 
observed fit. If this probability is not too small, the null hypothesis may bo 
accepted and the fit regarded as satisfactory. If, for example, there are 10 
classes in a distribution (so that k — 1 = 9), we see from Table III in the 
Appendix that the probability that > 16.919 is 0.05. A value of x«^ of 16 
or less would be regarded by most statisticians as not fun ishing good evidence 
against the null hypothesis, and the null hypothesis could therefore reasonably 
be accepted. On the other hand, the probability is less than 0.01 that 
X* > 22 (for 9 degrees of freedom) and a value of x«^ large as this v)ould 
provide good evidence against the null hypothesis. Since the test of goodness 
of fit is concerned with the whole course of the distribution, it is a non-parar 
metric test. 

13.3 Pooling of Class Frequencies. The proof of the approximate x® 
dist/ribution for x/ in (13.1) rests on the assumption that none of the frequen- 
cies /r is very small. How small fc may safely l>e is an open question, but we 
shall probably not be far w*rong in putting 5 as the lower limit. If it happens 
(as it often does in practice) that a few of the end classes have very small 
frequencies, it is well to group these classes together until no. class contains 
fewer than 5, before applying the x^ test. Thus in the data of Tabic 35, 
§10.5, the classes a: = 8, 9, and 10 have rosp)ectively /c = 1.1, 0.1, and 0.0, so 
that these should be pooled with the class x = 7. The numl)er of classes in 
the table is then reduced to 8, with 7 degrees of freedom. The loss of one 
degree of freedom is here attributable to the fact that the total frequency is 
fixed. In distributing 350 objects among 8 classes we can (within limits) put 
as many as we like in 7 of the 8 classes, but the numlxir in the 8th class is 
then determined by the number of objects we have left. 

The calculation of x«^ for the data of Table 35 with 0 == 3^ is shown in 
Table 45. The sum of the last column is 15.2, and the probability of a value 

Table 45, Goodness of Fit op Binomial 


X 

fo 

/. 



(/«-/.)’ 

0 

2 

6.1 

-4.1 

16.8 

2.8 

1 

22 

30.3 

-8.3 

68.9 

2.3 

2 

63 

68.3 

-6.3 

28.1 

0.4 

3 

76 

91.0 

-15.0 

225.0 

2.6 

4 

96 

79.7 

16.3 

266.7 

3.3 

5 

56 

47.8 

8.2 

67.2 

1.4 

6 

26 

19.9 

6.1 

37.2 

1.9 

7^10 

9 

6.9 

2.1 

4.4 

0.6 


350 

350.0 



15.2 
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of X* at least as great as this is about 0.033. The 6% point is 14,1 and the 
1% point 18.5. Our value therefore is great enough to make us reject the 
hypothesis that the distribution is really binomial with parameter ^ 3^. 

13.4 The Chi-Square Test of Hsrpotheses. A particular distribution 
among classes may be suggested by a preconceived theory or hypothesis, and 
this theory can then be tested by comparing the suggested distribution with 
the one actually observed. Thus, in a classic experiment, the Abbe Mendel 
observed the shape and color of peas from a numter of plants in the first 
generation progeny of a cross, and found that they could be classified in four 


groups, as follows: 

Round, yellow 315 

Round, green 108 

Angular (wrinkled), yellow 101 

Angular, green 32 


According to Mendel’s theory of heredity the expected numbers should be in 
the ratio 9 : 3 : 3 : 1, and therefore, for a total of 556, are 312.75, 104.25, 
104.25, and 34.75. The value of from these data is 0.47, with 3 degrees 
of freedom. The probability of a value of as great as this is about 

0.92, so that the agreement of theory and experiment is closer than would 
be expected. 

Very high values of the probability (say higher than 0.99) are sometimes 
encountered and are usually to be viewed with suspicion. The fit is too good 
to be true, and it is likely that the sample investigated is not truly a random 
sample of the population. 

13.6 The Chi-Square Test of Goodness of Fit for Graduated Distributions. 

In fitting a theoretical curve to a distribution* it is often necessary to calculate 
one or more parameters of the curve from the distribution itself. The true 
values of the parameters are therefore replaced by estimates, and it can no 
longer be taken for granted that the limiting distribution of x«^ for large N is 
a x^ distribution. However, if the estimates are the best possible ones (said 
to be most-efficient) it is true that the limiting distribution is a x^ distribution 
with fc — 1 — p degrees of freedom, where p is the number of parameters 
estimated from the sample. It is unfortunately not true that the method of 
moments (the one we have adopted in earlier chapters) always gives most- 
efficient estimates, but if the theoretical distribution is normal, binomial, or 
Poisson this method is quite satisfactory. 

The number p of parameters varies with the type of distribution, being 1 
for the Poisson curve, 2 for the normal, and 3 or more for various skew curves. 
For the binomial there is usually only one parameter ($) to estimate, the other 
(s) being fixed by the conditions of the problem. In Table 35, §10.5, values 

* The curve is, of course, “fitted” to a histogram. The term “theoretical curve” is often 
used to mean a theoretical distribution, particularly for a continuous variate. 
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of fe are calculated in column 4 for a value of 6 (0.36) estimated from the 
sample. If we pool the frequencies for x = 0 and 1 and also for x = 7, 8, 9, 
and 10, we have 7 classes and the number of degrees of freedom is 5. The 
value of x/ turns out to be 3.81, and the probability of a fit worse than this 
is about 0.57. The fit is therefore excellent. 

In fitting a normal curve to the distribution of weights of Glasgow school- 
girls (see Fig, 31, §§8.6 and 8.7), two parameters, m and tr, were estimated 
from the sample. In Table 31 theoretical frequencies are given for each class 
interval corresponding to the observed frequencies for the same intervals. 
This being a continuous distribution, the theoretical frequencies are areas 
under the normal curve between ordinates erected at the class boundaries; 
the first area extends from — oo to the end of the first interval and the last 
area extends from the beginning of the last interval to + «> . Thus, for the 
interval from 35.5 to 39.5 lb, fc ~ 60.3. This area, divided by the total fre- 
quency, is denoted by AA (= 0.0603) and is, on the null hypothesis, the 
probability that a schoolgirl selected at random out of the whole population of 
Glasgow (in the age group of the sample and at the time the sample was 
taken) would have a weight between 35.5 and 39.5 lb. 

If the two classes at the beginning and the two classes at the end in Table 
31 are pooled, we have the comparison of observed and calculated frequencies 
given in Table 46. It is evident that the fit is excellent. 

Table 46. Goodness of Fit of Normal Curve 


/. 

u 

fo-fc 

(/o - /.)* 

tfo -/.)•//. 

16 

17.2 

~2 2 

4.8 

0.28 

66 

60.3 

-4 3 

18.5 

0.31 

172 

165.5 

16.5 

272.2 

1.76 

245 

252.4 

~7.4 

64.8 

0.22 

263 

258.7 

4 3 

18.6 

0.07 

166 

167.2 

-11.2 

125.4 

0.76 

67 

68.1 

-1.1 

1.2 

0.02 

26 

20.6 

6.4 

29.2 

1.42 

1000 

1000,0 

0 


4.82 


X,* « 4.82, n»8-l~2=6 
P « 0.45 

13.6 Tests of Randomness. We have repeatedly used the word “random^' 
in speaking of samples and have defined a random sample as one in which 
every individual in the population has an equal chance of being included. 
This definition is not, however, of much use in deciding practically whether 
a sample is random or not. We have to look at the various items of the 
sample, in the order in which they comcj and see whether or not they exhibit a 
satisfactory degree of haphazardness. In calculating the statistics of a 
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sample we have not hitherto troubled about the order of arrangement of the 
items in the sample, but this order is essential in a discussion of randomness. 

Consider, for simplicity, the tossing of a coin, and suppose we denote ^^head'^ 
by 0 and ^^tail'^ by 1. Then a succession of tosses, in the order in which they 
are made, will be represented by a set of numbers like 

(13.2) 0110001010011110110101010 O--- 

We naturally expect that the proportion of Ts in such a set will tend, as 
the number of tosses increases, toward a value in the neighborhood of J. 
But, apart from this, we also expect the sequence to be haphazard. That is, 
we should think it very strange to get a sequence like 

(13.3) 0101010101010101010101010 I--- 
or 

(13.4) 11111100000011111100000011“- 

One method of testing a very long sequence for randomness is to pick out 
a subsequence (for example, every second term) and see whether the propor- 
tion of Ts in the subsequence is the same as in the original sequence. If it is 
so in every subsequence that w’e can choose in this way, then, according to 
R. Von Mises (see Reference 1, page 143), the sequence is random, but this is 
clearly not a practicable test, although w^e can, of course, try it on a few 
subsequences. Thus, if w-e choose the 1st, 3rd, 5th . . . items of (13.2) we get 
01011011 lOOOO---, and the proportions of Ts is still near i, but 
if we do the same in (13.3) we get only O^s. This test, therefore, rules out 
(13.3) as random. It does not rule out (13.4), but if we take a different sub- 
sequence (choosing Ist, 13th, 25th, etc., at intervals of 12), we get all Ts, 
and thus (13.4) is also non-random by the foregoing test. 

A more practical test of randomness is based on the numloer of runs of zeros 
and ones in the sequence. A run of zeros is a set of successive zeros closed off 
at both ends by Ts (except at the beginning and end of the sequence) and 
similarly for a run of Ts. Thus in (13.2) w'e have in succession runs of one 
0, two Ts, three O^s, one 1, one 0, one 1, two O's and so on, a total of 17 runs in 
all. In (13.3) there are 26 runs and in (13.4) only 5 runs. Clearly, a non- 
random sequence may have a large number of runs or a small number, com- 
pared with a random sequence. The probability of an assigned number of 
runs, on the hypothesis of randomness, can be calculated, and also upper and 
lower limits can be assigned within which we can be confident (with a specified 
degree of confidence) that the number of runs will lie. If it does not, we 
reject the hypothesis of randomness at the corresponding level. It turns 
out that in a sequence of 13 O^s and 13 Ts (like (13.2)), the number of runs 
should, at the 95% level, lie between 9 and 19 inclusive, so that this test re- 
jects (13.3) and (13.4) without rejecting (13.2). 
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IS.7 Distribution of Number of Runs. If we have m objects of one kind 
(say O'b) and n. objects of another kiixi (say Ts) arranged along a line, the 
number of possible arrangements is C'(A?^ + n, m). We will suppose that 
m <n (that is, we let m reter to whichever set of objects is the fewer in 
number). The numtier of arrangements with exactly u runs can be shown 
to be 

(13.5) /u - 2C(m l)C{n - 1, fc - 1) 

if u is even (== 2k) and 

(13.6) /« = C(m - 1, fc - l)C(n - 1, A: - 2) 

+ C{m —Ifk — 2)C{n — 1, fc — 1) 

if u is odd (= 2/: — 1). Here k can take all integral values from 1 to r/i + 1. 
The probability that the number oi runs is equal to or less than u' in a random 
arrangement is theretore given by 

(13.7) F[u<u')= iu/C (M +n,m) 

Thus, if m = n = 5, the probability of only 2 runs is given by putting 
fc = 1 in (13,5) and is 

P{u = 2} = 2[C(4, 0)]VC(10, 5) = 1/126 
and the probability that the number of runs is equal to or less than 4 is 
P{m < 4} = (/2 +/s +/4)/C(10,5) 

= [2 {C(4, 0) + 2C'(4, 1)C(4, 0) + 2 {C(4, 1) Y]/C{10, 5) 

= (2 + 8 + 32)/252 = i 

For w > 10, w is approximately normal, with mean 1 + 2mn/(m + n) and 

2mn{2mn m ~ n) 

variance ; — — ; — • 

(m + n)^{m + ^ — 1) 

Tables giving < u'} for m < n < 20 have been prepared by F. S. 
Swed and C. Lisenhart (Reference 1). They have also calculated a set of 
confidence limits for u. Since u is necessarily an integer, the probability 
cannot, in general, be adjusted exactly to a predetermined value. For the 
upper 95% limit, they give the smallest u' for which P{u < u') > 0.975 and 
for the lower limit the largest u' for which P {u < u'] < 0.025. The prob- 
ability for a value between these limits is approximately 0.95. 

Table 47 is a slightly modified extract from these tables for the case m = n. 
This table may be used to assess the significance of observed runs above or 
below the central line on a control chart, or above and below the median value 
in a sequence of measurements. The hypothesis of randomness is rejected 
if the observed u is less than the lower limit, or greater than the upper limit. 
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Tablb 47. Confidence Limits for Number of Huns in Sequence of m O’s 

AND m I's 


90% limits 96% limits 


m 

lower 

upper 

lower 

upper 

6 

4 

8 

3 

9 

6 

4 

10 

4 

10 

7 

5 

11 

4 

12 

8 

6 

12 

6 

13 

9 

7 

13 

6 

14 

10 

7 

15 

7 

16 

11 

8 

16 

8 

16 

12 

9 

17 

8 

18 

13 

10 

18 

9 

19 

14 

11 

19 

10 

20 

16 

12 

20 

11 

21 

16 

12 

22 

12 

22 

17 

13 

23 

12 

24 

18 

14 

24 

13 

26 

19 

15 

25 

14 

26 

20 

16 

26 

16 

27 

21 

17 

27 

16 

28 

22 

18 

28 

17 

29 

23 

18 

30 

17 

31 

24 

19 

31 

18 

32 

26 

20 

32 

19 

33 

26 

21 

33 

20 

34 

27 

22 

34 

21 

35 

28 

23 

35 

22 

36 

29 

24 

36 

23 

37 

30 

25 

37 

23 

39 

31 

26 

38 

24 

40 

32 

26 

40 

25 

41 

33 

27 

41 

26 

42 

34 

28 

42 

27 

43 

35 

29 

43 

28 

44 

36 

30 

44 

29 

45 

37 

31 

45 

30 

46 

38 

32 

46 

31 

47 

39 , 

33 

47 

31 

49 

40 

34 

48 

32 

60 


For m > 10, the number of runs is approximately normally distributed, with mean m -f 1 
and variance m(m — l)/(2m — 1). 

Example 1. If the 30 observations in Fig. 48, §12.17, are classified as above or below the 
central value (a or 6), we obtain the sequence 

babbbbaabbaaaabbbaabaabaabbbbb, 
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for which w « 13, n « 17, u « 13. In this case m is not the same as n, and from the original 
tables we find that the 95% limits for u are 10 and 21. There is no need to worry about a 
lack of randomness. Limits 11 and 21 are given in Table 47 for m « 15, and as a general 
rule if m and n are reasonably large and not very different we can, as an approximation, use 
an average value for m in Table 47. 

Example 2. A student opened a set of mathematical tables, with the entries blocked off 
in sets of five, and, starting anywhere, added the five terminal digits m each block of five 
numbers. The sums so obtained for 50 blocks were: 


12, 15, 18, 30, 33, 25, 28, 22, 23, 17 

25, 18, 22, 13, 17, 18, 22, 25, 27, 30 

28, 32, 24, 27, 20, 22, 15, 18, 20, 23 

12, 15, 27, 30, 33, 25, 28, 20, 23, 17 

26, 18, 20, 13, 17, 18, 22, 33, 27, 30 

Test this sequence for randomness. 

The median of these numbers is 22, and if we assign a’s and h’s according as the numbers 
are above or below the median we get 

bbbaaaa — abab — bbb — aaaaaaab — bbbabb 
aaaaabababbbbb ~ aaa 

There are 21 b’s, 24 o’s and 6 numbers which lie on the median. We can fill in these five 
with four 6’s and one a in any way we like, preferably (so as to be conservative in rejecting 
the hypothesis of randomness) increasing the number of runs. Thus, if we complete the 
set of a's and b's as below, we have m — n « 25, w *■ 20, 

bbbaaaababababbbbaaaaaaab 

bbbbabbaaaaabababbbbbbaaa 

The number of runs is less than we should expect, but well within the 96% confidence limits, 
so that the hypothesis of randomness is not rejected. 

By filling the first blank space with an a and the others with 6’s, we get only 16 runs, which 
would mean rejection of the hypothesis of randomness at the 96% level. The reason for 
filling in the spaces so as to increase the number of runs is that, in most practical cases 
where we want to test randomness (as in quality control w’ork), the deviation from expecta- 
tion is more likely to be in the direction of fewer and longer runs than m that of more and 
shorter runs. Wear on a machine tool, for example, may cause a gradual increase in the 
diameter of a machine part, or a slight slip in the setting may cause a sudden increase. 
Both causes will tend to produce long runs. If therefore w^e make the number of runs as 
large as possible, we reduce the risk of having to look for trouble where really none exists. 

13.8 Run Test of Difference Between Two Samples. Given two samples, 
each consisting of m values of a variate x, we can make a rough test of the 
hypothesis that they come from the same population. The method is to 
arrange the 2m values in order of magnitude and test whether this order is 
random in respect of the two samples. Since, if the two samples are signifi- 
cantly different, the number of runs will be smaller than if they come from 
the same population, a one-sided test is indicated. The lower 90% limit in 
Table 47 will therefore give a significance level of 5% and the lower 95% limit 
a significance level of 2i%. 
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For instance, the following data given by Snedecor (Reference 2) refer to 
daily gains in weight (lb) of two lots of calves, each lot on a different ration: 

I 1.95, 2.17, 2.06, 2.11, 2.24, 2.52, 2.04, 1.95 
II 1.82, 1.85, 1.87, 1.74, 2.04, 1.78, 1.76, 1.86 

Placed in order of magnitude, with those from lot II underlined, these 
values are: 

1.74 , 1.76 , 1.78 , 1.82 , 1.85 , 1.86 , 1.87 , 1.95, 1.95, 

2.04, 2.06, 2.11, 217, 2.24, 2.52 

The number of runs (of underlined or n on-underlined values) is 4. From 
Table 47 the 5% significance level corresponds to u = 6, so that there is a 
significant lack of randomness at this level. In fact, the probability of 4 
runs or fewer may be computed from equation (13.7) as 19/2145 = 0.009, 
so that this number is actually significant at the 1 % level. 

The advantage of this test is that, it assumes very little about the nature 
of the parent population — merely that the variate is continuous and that 
the samples are random and indei)endent. The disadvantage is a lack of 
power, a very marked differenc*e l)etween the samples being required, as a 
rule, to produce a significant departure from randomness. 

In the example given above there were two identical values (each 2.04), 
one from each sample. In whichever order these are placed, the number of 
runs is still 4. Sometimes, however, the number of runs will bo affected by 
the placing of such identical pairs, and then all the possible orders should 
be tested. 

13.9 Random Numbers. The choosing of a truly random sample, even 
from an artificial population of discs, balls, cauds, or the like, is not easily 
accomplished, since the common methods of mixing and shuffling are often 
inadequate. When it is necessary to pick random samples from a crop in the 
ground, or a field subdivided into small plots, or a group of experimental 
animals, the task is much harder. If one relies on personal judgments, it is 
diflScult to avoid a tendency to pick what seem to l)e typical rather than 
random samples. Experience has shown that it is advisable to use a set of 
numbers which have been thoroughly tested for randomness, and to roly on 
these numbers for picking the sample. Tables of random numl^ers have been 
compiled by Tippett, by Kendall and Babington Smith, and by Fisher and 
Yates (see Reference 3). A 4-pago extract from the tables of Kendall and 
Babington Smith is given in the Appendix (Table V). 

These tables consist of four-digit numbers, obtained by a mechanical process 
and tested in several ways for randomness. In order to pick out a random 
sample from a given finite population, the items of the population are num- 
bered consecutively and blocks of consecutive 4-digit numbers are allotted to 
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each item, the size of the blocks being such that the whole range of 10,000 
numbers is fairly well covered. Thus, if we want to pick a sample of 20 from 
a population of 300, we can allot blocks of 30 numbers to each item. To the 
first item will correspond numbers 00(K) to 0029, to the second item, numbers 
0030 to 0059, and so on, until to the 300th item cori'espond numbers 8970 to 
8999. We then read off 20 consecutive numbci*s, starting anywhere in the 
table of random numbers (and disregarding numbers beginning with 9). 
Each number will fall in some block and will therefore represent some item 
in the population and that item is picked for the sample. 

With a population of 1(K) or less we can read the random numters as 2-digit 
numbers, each 4-digit number being split in two. 

A similar procedure will enable us to choose samples from a continuous 
distribution grouped in classes. For exampk?, suppose we wish to pick a 
random subsamplo of 50 Glasgow schoolgirls from the sample of 1000 described 
in Table 27, §7.7. We must divide the 10,000 possible random numljers into 
blocks, of sizes proportional to the frequencies in the various classes, as shown 
in Table 48, where the first block contains 10 numters, the next 140, and so on. 
We then read off a list of 50 random numbers, such as 9327, 6908, 251 1, 8268, 
3768, 6735, 9214, 0740, . . .and for each number take any girl from the corre- 
ponding class — in this example, from classes 8, 6, 5, 7, 5, 6, 8, 4 . . , 

Table 48. Allocation of Random Numbers for Subsampling from 
Sample of Table 27 


Class Mark (lb) f Block of Nvmhers 


(1) 

29.5 

1 

0000-6009 

(2) 

33 5 

14 

0010-0149 

(3) 

37.5 

56 

0150-0709 

(4) 

41.5 

172 

0710-2429 

(6) 

45.5 

245 

2430-4879 

(6) 

49.5 

263 

4880-7509 

(V) 

63.5 

156 

7510-9069 

(8) 

67.5 

67 

9070-9739 

(9) 

61.5 

23 

9740-9969 

(10) 

65.6 

3 

9970-9999 



1000 



For use in sampling experiments we can make up a similar table corre- 
sponding to any given population. If the population distribution is continuous 
(normal, for example), we can use a table of areas to give the sizes of blocks 
of random numbers corresponding to specified intervals of the variate x. In 
so doing we are, of course, replacing our continuous distribution by a dis- 
continuous one, but if there are 20 to 30 classes corresponding to the effective 
range of the distribution, the resulting approximation is quite satisfactory. 
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For example, if we want a normal distribution with 20 and <r — 4, we 
can choose the a:-interval as unity, divide the whole range into intervals such 
as 6.5 to 7.5, 7.5 to 8.5, etc., write z = (x — 20)/4, and find the corresponding 
areas AA. These areas, multiplied by 10,000, will give the sizes of the corre- 
sponding blocks of 4-digit random numbers. 

Tables of random normal numbers have been compiled. One such table is 
given in Dixon and Massey’s ‘'Statistical Analysis” (Reference 5 of §0.4). 
These tables can be used directly to give random samples from the particular 
normal population specified at the head of the table. 

13.10 The Sign Test for Differences in Paired Samples. The ordinary 
^-tost for the reality of an observed effect in paired samples assumes that all 
the pairs may be regarded as random samples from the same population of 
pairs. Son;etm:es this assmnption cannot be made, the pairs having been 
observed under widely different conditions. A noii-parametric test, based 
only on the signs of the differences, can be used in such cases, although natu- 
rally it is not as powerful a test as the <-test, since it uses less information and 
makes fewer assuni])tions. It is, however, very simple and easy to apply. 

Suppose A and B are two materials or varieties or treatments to be com- 
pared, and let xi be a iiieasurornent on A and X 2 the corresponding measure- 
ment on B in the same pair. Let there l>e N differences — Xi, — X 2 t^ and 
let r be the number of times the less frequent sign occurs in the set of so 
that r < N /2. It ma}^ hafipen that some differences are exactly zero — those 
are excluded and the sample size correspondingly reduced. Then the distribu- 
tion of r is binomial with 0 == and the critical values of r, corresponding to 
assigned significance levels, can bo obtained from tables of the binomial 
distribution. The null hypothesis here is that each dt has a distribution with 
median 0 (this distribution not being necessarily the same for all the d,). The 
hypothesis is i ejected if r differs significantly from N — r. 

In Table 40, §12.12, data wore given on hemoglobin in anemic rats, and the 
^test was used <0 test the significance of the observed effect of a change in 
diet. If the experiments on the various pairs had been carried out in dif- 
ferent places, with differcnt-sized rats, of different racial strains, and so on, 
the f-test would have been iiiappilicable. A(icording to the sign test, r = 4, 
N = 12. If wo suppose that the difference in diet could only increase x, if it 
has any effect at all, we are interested in the probability that the number of 
negative signs is 4 or less. This is given by 

ic{N,r){i)^ = 0.194 

r «.0 

and therefore the observed effect is, by this test, non-significant, as in fact it 
also is by the /-tost. If we do not know which way the effect will go, we want 
the probability that r is 4 or less for either 'positive or negative signs, and this is 
double the previous probability. 
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Table 49 (See Reference 7) gives for various values of N approximate dig- 
nificance levels of r (for a two-tailed test). Values equal to or less than those 
given in the table aie significant at the indicated level. For a one-tailed test, 
the significance level should be halved. For significance at the 5% level in 
the example quoted above, the number of minus signs would have to be 
2 or less. 


Table 49. Ckitical Values of r for the Sign Test (Twchtaii^ed) 


N 

5% 

10% 

N 

6% 

10% 

N 

5% 

10% 

9 

1 

1 

36 

11 

12 

63 

23 

24 

10 

1 

1 

37 

12 

13 

64 

23 

24 

11 

1 

2 

38 

12 

13 

65 

24 

25 

12 

1 

2 

39 

12 

13 

66 

24 

26 

13 

2 

3 

40 

13 

14 

67 

26 

26 

14 

2 

3 

41 

13 

14 

68 

26 

26 

16 

3 

3 

42 

14 

15 

69 

25 

27 

16 

3 

4 

43 

14 

15 

70 

26 

27 

17 

4 

4 

44 

16 

16 

71 

26 

28 

18 

4 

6 

46 

16 

16 

72 

27 

28 

19 

4 

6 

46 

15 

16 

73 

27 

28 

20 

6 

6 

47 

16 

17 

74 

28 

29 

21 

6 

6 

48 

16 

17 

75 

28 

29 

22 

6 

6 

49 

17 

18 

76 

28 

30 

23 

6 

7 

60 

17 

18 

77 

29 

30 

24 

6 

7 

61 

18 

19 

78 

29 

31 

26 

7 

7 

62 

18 

19 

79 

30 

31 

26 

7 

8 

63 

18 

20 

80 

30 

32 

27 

7 

8 

64 

19 

20 

81 

31 

32 

28 

8 

9 

66 

19 

20 

82 

31 

33 

29 

8 

9 

66 

20 

21 

83 

32 

33 

30 

9 

10 

67 

20 

21 

84 

32 

33 

31 

9 

10 

58 

21 

22 

86 

32 

34 

32 

9 

10 

69 

21 

22 

86 

33 

34 

33 

10 

11 

60 

21 

23 

87 

33 

36 

34 

10 

11 

61 

22 

23 

88 

34 

35 

36 

11 

12 

62 

22 

24 

89 

34 

36 


For TV > 90, r is approximately the integer next below {N — l)/2 ~ k{N -f l)'^, with 
k « 0.9800 and 0.8224 for the 5% and 10% values, respectively. 

The sign test can be extended somewhat to answer such questions as the 
following: 

Is A better than B by p% or hy u units? To answer these we merely 
increase every measurement {xt) on jB by p% or by w imits, and compare 
the resulting set with the original measurements (xi) on using a one- 
tailed test. 
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Example 3. Suppose the data in Table 50 represent yields in bushels of two varieties of 
appleSi these varieties being grown on adjacent trees* Variety A is clearly superior 
the signs of Xi — a:* are positive). Is it better than variety B by 5 bushels? 

The numbers in column 4 (excluding zeros) give r « 2, = 11, and this value of r is 

significant at the 6% level. If we ask whether B is 6 bushels better, we get the numbers in 
column 5, giving r « 5, A «= 11, which is non-significant. We are therefore prepared to 
say that A is better than B by as much as 6 bushels, but not by as much as 6 bushels. 


Table 50. Compakison op Yields op Two Varieties by Sign Test 


*i(A) 

*»(B) 

Xi- x% 

Xi ~ (a^ -f 5) 

Xi “ {X2 + 6) 

13 

11 

2 

-3 

-4 

12 

6 

6 

1 

0 

10 

3 

7 

2 

1 

6 

1 

6 

0 

-1 

13 

7 

6 

1 

0 

15 

10 

5 

0 

-1 

19 

9 

10 

6 

4 

10 

4 

6 

1 

0 

11 

3 

8 

3 

2 

11 

6 

6 

0 


13 

8 

5 

0 

-1 

9 

5 

4 

-1 

-2 

14 

7 

7 

2 

1 

12 

6 

6 

1 

0 

12 

4 

8 

3 

2 


13.11 Inequalities of the Tchebycheff T 3 rpe. It was proved by Tcheby- 
cheff (and independently by BienaymO) that far any population, no matter 
how queer the distribution (provided only that the variance is finite), the 
probability that a random value of the variate x will differ from its expected 
value by as much as X is not more than o^/X*. 

In S 3 rmbol 8 , 

(13.g) Pr{|x-M| >X}<<rVX’ 


For example, the probability of a deviation from the expected value of at 
least 3<7 is never more than i. Of course, if we know the form of the distri- 
bution, we may be able to make this inequality much sharper. If the dis- 
tribution is normal, the probability of a deviation numerically as great as 
3<r is only 0.0027, but the point about the Tchebycheff inequality is that it is 
true (with the restriction about the variance) for any distribution. 


If we apply this inequality to the variate 5 — t: where the are 

N i 

independent variates with the same distribution, the variance of S is 
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and the inequality becomes 

(13.9) Pr {I 2 I > >‘ } < 


Since any probability < 1, the right-huiid side must be less than 1 to 


give a useful result. 


If we put it equal to 4, wo see that X = 



so that I f — M 1 is of the order of 1 / Vlv. This means that the discrepancy 
between the sample mean and the population iiican is of the order of unity 
divided by the square root of the sample size, a result which emphasizes the 
importance of using large samples to get accurate estimates. 

Various other inequalities of a similar natui*e arc known. For instance, if 
we know that the distribution has a single mode at /zo, we have the Gauss 
inequality 


(13.10) Pr {| X - MO I > Xr } < 4/(9X--‘) 


where 


t 2 = 0-2 + (/i — go)^ 


If the distribution has a fourth moment then, as shown by Robbins, 
(13.11) Fr {1 X - M I > X} < [m 4 + 3(V - l)a^]/V3X' 

Where the range of the variate is known to bo bounded, upper limits can 
be given for the rnoiueiits. For example, if a distiibution of heights is known 
to lie between 64 and 78 in., the largest possible value of is when half the 
population has height 64 in. and half 78 in., and is therefore V = 49 in.^. 
The corresponding value of m is 7* == 2401 in.'*. The probability that the 
mean height for a sample of 200 individuals from this population will differ 
from the mean of the population by as much as 1 in. is, from (13.9), not 
more than 0.25 (X = 1, = 200, cr^ = 49), and, from (13.11), not more 

than 0.18. 

13.12 Order Statistics. The statistics most commonly used, such as the 
mean and standard deviation, depend on linear or quadratic combinations of 
the various observed values without regard to their order. Other statistics, 
including the median and other percentiles and the range, depend on the 
ordering of the observed values according to magnitude. These statistics 
are spoken of as order statistics or systematic statistics. 

Although much is known about the exact distributions of statistics of the 
first type, particularly for samples from a normal parent population, the 
exact distributions of order statistics are usually very troublesome to cal- 
culate. Tables have been calculated for a normal parent population, and 
some results are known for rectangular and other special populations, and for 
very small samples. 

Reference 8 (at the end of the chapter) gives an interesting treatment of 
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some of the more important results in the sampling theory of order statistics 
and their applications to statistical inferonce. 

13.13 The Median. If the parent population is symmetrical, the sample 
median may be regarded as an estimate of the population mean ju. It is an 
unbiased estimate, since its expected value is but its sampling variance is 
often larger than the sampling variance of the arithmetic mean, and therefore 
it is not as efficient an estimate. For a normal population, the mean is most 
efficient (no other unbiased statistic for estimating /x can have a smaller 
variance), and the efficiency of*the median is measured by the ratio of the 
variance of the mean to the variance of the median. This efficiency depends 
on the sample size, and for large N it tends to the value 2/7r = 0.637. That 
is to say, we can get, on the average, as good a value of /x from the mean of 
637 observations from a normal population as from the median of 1000. 

For small samples, the median is relatively more efficient. Thus for samples 
of size Nj the efficiency is given by E in the following table: 


3 4 5 

6 

7 

8 9 

10 

0.74 0.84 0.69 

0.78 

0.67 

0.74 0.65 

0.71 


The efficiency is higher when N is even because then the median is defined 
as the arithmetic mean of the two middle values. 

Confidence limits for the median of tlte parent population obtained from 
a small sample have been calculated by K. R. Nair (Reference 4). These 
limits are non-pararnetric; they make no assumption about the nature of the 
parent population, except that the variate is continuous. We suppose the 
observations arranged in ascending order of magnitude, Xi < X 2 < • * * < xat, 
and make a statement that the median lies between Xk and The 95% 

confidence limits are given by the largest k for which the probability that the 
statement is true is at least 0.95. Thus, UN — 20, wo find it = 6. The 
probability, before drawing a sample of 20, that the median of the population 
will lie between the 6th and the 15th observations, when these are arranged 
in order, is at least 0.95 and, in fact, is 0.959. An extract from Nairas table 
is given in Table 51. 

The distribution of the median of samples from a population with a given 
frequency function f{x) can be calculated, under certain assumptions, for 
the limit when the size of the sample tends to infinity. Let the nrmber 
of elements in the sample be 2n + 1, so that the median is the value Xn+i. 
If io is the population median, so that 

/ XO 

f(x) dx = 1/2 

-00 

then it can be proved that for large values of n the sample median x is approxi- 
mately normally distributed with mean xo and variance l/[8n/*(io)] where 
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Table 51. Appkoximate 95% Cokfidence Intervals jor the Median 
(The confidence coefficient is F that the population median lies between Xk and xjv-iH-i) 


N 

k 

P 

N 

k 

P 

10 

2 

0.979 

31 

10 

0.971 

11 

0 

.988 

32 

10 

.980 

12 

3 

.961 

33 

11 

.965 

la 

3 

.978 

34 

11 

.976 

14 

3 

.987 

35 

12 

.969 

16 

4 

.965 

36 

12 

.971 

16 

4 

.979 

37 

13 

.953 

17 

5 

.951 

38 

13 

.906 

18 

5 

. 969 

39 

13 

.976 

19 

5 

.981 

40 

14 

.962 

20 

6 

.959 

41 

14 

.972 

21 

6 

.973 

42 

15 

.956 

22 

6 

.983 

43 

15 

.968 

23 

7 

. 965 

44 

10 

.961 

24 

7 

.977 

45 

10 

.964 

25 

8 

.957 

40 

16 

.974 

26 

8 

.971 

47 

17 

.960 

27 

8 

.981 

48 

17 

.971 

28 

9 

.904 

49 

18 

.966 

29 

9 

,976 

50 

18 

.967 

30 

10 

.957 





P(xq) is the square of /(io)- This is called an asymptotic distribution. 
parent population is normal, then 


m = 


1 — 


If the 


and Xo *= Hence /(io) = l/[cr V 27r], so that the variance of x is 


27r<7^/8n = 


Since the variance of x is a^l{2n + 1)> the efficiency of the median is 

^nj[tr{2n + 1 )] 

or approximately 2/7r. 

13.14 Estimation by Percentiles. The average of the quartilea 
(Qi + 0a)/2, or (P 26 + Pn)/2 in the notation of percentiles, is an efficient 
estimate of tlxe mean of a normal distribution, the efficiency being around 
0.87 for samples of 10 and falling off slowly to 0.81 for very large samples. 
Still higher efficiencies may be obtained by using more percentiles, for instance, 
0.88 for (Pit + Pm + P%z)/^ in large samples. These and other percentiles 
may be very rapidly found mechanically when the data are entered on cards, 
one card for each value. The cards are arranged in order of magnitude of 
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the observations and run through a sorting machine. If we have 300 cards, 
we run 51 through tho machine and read the 51st card, run another 99 through 
and read the 150th, and finally run yet another 99 through and read the 249th. 
These three cards will give Pn, Pso, and Pgz respectively. 

Dispersion also may l)e estimated by percentiles. If only two are used, 
, the sampling variance of the estimate of a is least when the percentiles used 
are Pot and P93, and is then 0.65 for large samples. For a normal popula- 
tion, the interval P93 — Pot is equal to 2.952a, as is easily found by interpola- 
tion in the table of areas for the standard nonnal curve. The estimate of 
<7 from these two percentiles is (P93 — Pot)/2.952 = 0.3388 (P93 — Pot). If 
four percentiles are used, the best estimate of a is 0.1714 (P97 4* P85 ~ Pie 
— P03), with an efficiency of 0.80. 

If we want to estimate both the mean and the standard deviation by the 
same percentiles, we have to comi)romise between the requirements for these 
two statistics separately. The percentiles which give the best estimates of 
/I aie not those which give the best estimates of cr. For estimation with two 
percentiles, we can take 


ju = (Pj6 4- P86)/2, efficiency 0.73 
a = 0.4824 (Psb — Pu), efficiency 0.56 


and for estimation with four percentiles, 

. f M = (P 06 + P30 4“ Pto 4- P 96 )/ 4 , efficiency 0.80 

(Id.lo) 

I a - 0.2305 (P 96 4- Pto - P 30 - Pos), efficiency 0.74 


Percentile estimation is particularly useful in situations where it is compara- 
tively easy to arrange the sample values in order, but much more troublesome 
to carry out the actual measurements. The exact measurements are required 
only for the percentile values used in the estimation, and not for the whole 
sample. 

13.16 The Range. The range is the difference between the largest and 
Smallest observations in the sample, namely 

(13.14) R - xn — Xi 

As an estimate of dispersion for samples from a normal population the range 
is very poor for large A, but for small samples its efficiency is remarkably high. 
This, of course, does not mean that the range of a very small sample will give 
a very good estimate of the population standard deviation, it means merely 
that the range is almost as good in this respect as the sample standard devia- 
tion. With a small sample no estimate can be very precise. Table 52 gives 
the values of the multiplier k used in estimating a from P, and also the corre- 
sponding efficiencies, for values of N from 2 to 10. The range is seldom used 
for large samples. 
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TabIiB 52. Estimation from the Range for a Normal Parent Population 

99% Confidence Limits for Rjv 


N 

k 

E 

lower 

upper 

2 

0.886 

1.00 

0.01 

3.97 

3 

.691 

0.99 

.13 

4.42 

4 

.486 

0.98 

.34 

4.69 

5 

.430 

0.90 

.55 

4.89 

6 

.395 

0.93 

.75 

5.03 

7 

.370 

0.91 

.92 

6.16 

8 

.351 

0.89 

1.08 

6.26 

9 

.337 

0.87 

1.21 

5.34 

10 

.325 

0.86 

1.33 

6.42 


c « kR^ efficiency = E 


This table may be used in forming limits for a control chart, the mean range 
R being obtained from a considerable number of samples each of size Nj and 
the standard deviation of the population estimated from 

(13.15) - fcS 

The distribution function for the range in samples from a population with 
known frequency function is expressible in terms of integrals but is difficult 
to evaluate, even for a normal population. The only important case for 
which the range turns out to have a simple distribution function is that of a 
rectangular parent population (one for which /(x) = 1/c, 0 < x < c, and 
/(x) = 0 for all other values of x). The distribution function for the range 
R of samples from a rectangular population, that is, the probability for a 
value less than i?, is given by 

/R\^ 

~ (iV ~ 1) j 

The probability of a value between R and R + dR is 

(13.17) f{R)dR = N (f y * “ f ) 

For the normal law, H. 0. Hartley (Reference 5) has calculated tables giving 
the probability F(R) for values of N between 2 and 20. Upper and lower 
99% confidence limits for the ratio of /? to or are given in Table 52 for sample 
sizes from 2 to 10. 

13.16 Quotient of Ranges in Samples from a Rectangular Population* 

A rectangular population is not quite as artificial as it appears at first sight. 
It has been asserted that errors of observation in accurate physical measiure- 
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mentB are in reality more nearly rectangular than normal — very large errors 
are not merely rare but do not occur at all (apart from gross mistakes, which 
are usually obvious). Also, in the production of machir»e parts in a factory 
to rather narrow specification limits, when only those articles which comply 
with the specification are included in the population, it appears that the 
hypothesis of a rectangular distribution is not unreasonable. 

Let us suppose that we have two random samples of sizes m and n, with 
ranges Ri and R%y from the population specified above. The distribution of 
R 1 /R 2 — ^ was worked out by Rider (Reference 6) and turns out to be inde- 
pendent of c. The frequency curve is a skew curve, with the mode at 

(13.18) t2 « (m — 2)(m + n)/{m — l)(m + n — 2), m — n < 2 
or 

(13.19) t2 == (n “h l)(m + n — 2)/n(m + n), m — n > 2 
and the mean at 


(13.20) /Lii' = (m — l)n/(m + l)(n — 2) 


Its equation is 
(13.21) f(u) 


for 0 < w < 1, and 
( 13 . 22 ) m 


for 1 < u < 00 . 


m{m — l)n(n ~ 1) ^ 

(m + n){m + n ^ l)(m + n — 2) 

[(m + n)w’"-^ — {m + n — 2)^"^'”^] 

7n(m — l)n(n — 1) ^ 

(m + n) (m + n — 1) (|?z + n — 2) 

[{m + n)u'^ — (m + n — 2)^”^"*^] 


Table 63, giving values for the quotient of ranges which will be exceeded 
in 5% of random samples, may be used to test the hypothesis that two samples 
come from the same rectangular population. 


Example 4. The width of a slot in a certain airplane part was measured to the thoxisandth 
of an inch in a sample of 6 parts on the first day of production, and again in a sample of 
10 two days later. The results (in thousandths of an inch in excess of 0.800 in.) were 

I 77, 80, 78, 72, 78, (Ri - 8) 

II 76, 77, 76, 76. 77, 79, 75, 78, 77, 76, (ft « 4) 

The null h3rpotheBi8 is that both samples come from the same rectangular population. 
Taking m as the number in the sample with the larger range, we have 

w ■■ 2, m ■» 6, n » 10 
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Tablb 63. 6% Points fob the Dibtribdtion op the Quotient of Ranges 
FROM A Rectangular Population 


n 


m 

« number in sample with greater range 



3 

4 

5 

6 

7 

8 

9 

10 

3 

4.00 

4.64 

5.08 

5.39 

6.63 

6.82 

6.97 

6.09 

4 

2.31 

2.62 

2.83 

2.99 

3.11 

3.20 

3.28 

8.34 

6 

1.75 

1.96 

2.10 

2.20 

2.28 

2.34 

2.40 

2.44 

6 

1.49 

1.64 

1.75 

1.83 

1.89 

1.94 

1.98 

2.01 

7 

1.34 

1.46 

1.55 

1 61 

1.66 

1.70 

1.73 

1.76 

8 

1.24 

1,35 

1.42 

1.47 

1.52 

1.55 

1.58 

1.60 

9 

1.17 

1.27 

1.33 

1.38 

1.42 

1.45 

1.47 

1.49 

10 j 

1.13 

1.21 

1.27 

1.31 

1.34 

1.37 

1.39 

1.41 


From Table 53, the 5% point is 1.27, so that the value of u is significant. The actual proba- 
bility of getting a value of u as large as 2 is 0 0013, which can be calculated by integrating 
equation (13 22) for the distribution of u. The conclusion is that the second sample is 
significantly more uniform than the first. The machine has settled down to a steadier 
production. 

Note that this test is analogous to the F-test discussed in §12.14. The null 
hypothesis is that the quotient of ranges is 1, and it is tested against the 
alternative hypothesis that the quotient is greater than 1. If we want to test 
the null hypothesis against the alternative that the quotient is either greater 
than or less than 1, the 5% level becomes a 10% level. In Example 4, we 
should reject the null hypothesis at the 5% level if either 

u > 1.27 (m = 5, n = 10) 

or 

l/u > 2.44(w = 10, n = 5) 
and the probability of the combined event is 0.10. 

Exercises 

1. Test the goodness of fit of the Poisson distribution to results of a sampling experiment 
in Table 36, §10.8, Note that two sets of calculated frequencies are given, one correspond- 
ing to an assumed value of X and the other to an estimated value. The degrees of freedom 
are one fewer in the latter case. 

8. Test the goodness of fit of the Poisson distributions fitted in Exercises 16 and 17, 
page 160. 
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S. The distribution of Table 26, page 87, was graduated by means of a normal curve 
in Exorcise 16, page 122. Test the goodness of fit. 

4 . Try for yourself the procedure described in Example 2, §13.7, for forming a set of 
random numbers (all these numbers lie between 0 and 45, inclusive). Collect 80 such 
numbers and test the sequence for randomness. 

5. A student dealt 26 cards from an ordinary deck 50 times and each time counted the 
number of Honors in the 26 cards dealt. (A, K, Q, J, 10 counted as Honors.) The dis- 
tribution obtained was: 


X 

4 

6 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

fo 

1 

0 

2 

3 

7 

6 

10 

8 

2 

7 

2 

0 

2 

1 


Would you reject the hypothesis that the cards were well shuffled between each deal? 

Hint. The probability on this hypothesis of x honor cards in 26 is r(20, a‘)C(32, 26 — x)/ 
C(62, 26). If this probability is calculated and multiplied by 60, the theoretical distribution 
is 


X 

4 5 

6 7 

8 

9 

10 

11 

12 

13 

14 15 

16 

u 

0,0 0.2 

0.9 2.7 

6.0 

9.6 

11.2 

9.6 

6.0 

2.7 

0.9 0.2 

0.0 


Compare the two distributions by the test. 

6. In the following table x is the number of hves or sixes observed in a single throw with 
five dice* 


X 

0 

1 

2 

3 

4 

5 

fo 

23 

90 

81 

30 

19 

0 


Would you reject the hypothesis that the dice are true? 

Hint. The probability on the null hypothesis of x fives or sixes with 6 dice is 

7, Two samples of machine parts gave the followmg measurements (unit thousandth of 
an inch) : 

I 801, 798, 800, 806, 800, 804 

II 796, 797, 796, 798, 794, 796 


Do these differ significantly by the run test? 

Ans. Number of runs is 2 or 4. The 6% significance value is 3. 

8 . Use the table of random numbers (Appendix Table V) to select a set of 100 random 
samples of 10 from the population of Table 38, §12.2. Find the mean of these samples and 
compare with the mean of the parent population. 

Hint. Form a table of blocks of random numbers like that in Table 48. 

9. Use the table of random numbers and the blocks given in Table 48 to draw a subsample 
of 100 from the population of 1000 Glasgow schoolgirls. Form this subsample into a 
frequency distribution and calculate its mean and variance. Compare with the theoreti- 
cal values. 

10. The following distribution of the range was obtained in 200 samples of 10 from an 
artificially constructed, approximately normal population with mean 20 and standard 
deviation 4: 



R 

6 

6 

7 

8 

9 

10 

11 

12 

13 



f 

2 

4 

4 

14 

11 

20 

25 

28 

25 


R 

1 

14 

16 

16 

17 

18 

19 

20 

21 

22 

23 

f 

1 

17 

13 

13 

5 

9 

3 

3 

3 

0 

1 
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Calculate the mean range and estimate the standard deviation of the population. Compare 
with the true value 4. 

Estimate the 99% limits for /2/22 from this sample and compare with the values given in 
'fable 44, §12.17, for N * 10. 

11. A distribution is symmetrical with a single mode x ^ 0 and is bounded between 
x er —6 and X — h. Find 95% limits for the deviation of the mean of a sample of 100 from 
the population mean. Arts. |i&| < 0.166, with a probability of 0.96. 

Hint Use inequality (13 11). The greatest possible values of and m will be less 
than the values for a rectangular distribution /(x) = 1/26, —6 <x <6. For this distribu- 
tion, a* = 6*/3 and m 4 = 6V5. Put the probability that 1 26 — 0 j > X equal to 0.06 and 
solve the equation so obtained for X, 
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CHAFfER XIV 
TIME SERIES 

14.1 Time as a Variable. Hitherto we have considered the distribution of a 
single variable, first from the descriptive standpoint and then from the pohit 
of view of estimation and significance. We now have to take up problems m 
which there are two variable quantities, and in the present chapter one of 
the two variables will be time. A set of data depending on the time is 
called a time, series. 

The time variable, of course, does not fluctuate arbitrarily. It moves 
unifonnly, always in the same direction, from past to future. We can often, 
however, exercise some freedom of choice as to the times at which we make 
observations, although in most instances it is convenient to observe at regular 
intervals. 

In the typical time series there are discernible three main features which 
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Z\9 






220 


Time Series 


XIV 


seem to be independent of one another and attributable to distinct causes: 
(1) a broad long-term movement, called the trendf such as a more or less 
steady rise or fall; (2) an oscillation about the trend, which may be a seasonal 
effect with fairly regular period or a rather long-period, somewhat irregular 
oscillation, often called a cycle; (3) an irregular, unsystematic or random 
component, sometimes called the residual. 



1900 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 


Fig. 50. Relative Sunspot Numbers 

Not all time series exhibit all three of these features. Three series are 
graphed in Figs. 49, 50 and 51. Fig. 49 gives the population of the U.S.A. at 
decennial censuses since 1790 and is mainly trend with a small residual element 
of randomness. Fig. 50 gives the mean annual sunspot number (Wolf and 
Wolfer system) for fifty years, and the cyclical aspect is much in evidence, 
combined with a random effect. There is very little evidence of trend. 
Fig. 51 gives the total annual precipitation at Edmonton, 1900-1950, and is 
evidently almost entirely random. 

One task of time series analysis is to disengage these separate elements 
from a given set of data, so as to exliibit the trend and the oscillations, if 
any, apart from the random fluctuations. Trends and cycles are important 
from the point of view of interpolation and also of attempted forecasts. The 
trend of population statistics can be interpolated to give estimates of the 
population for years in which no census is taken. Estimates so made are 
automatically corrected by the next following census. Also, if an economic 
time series shows a well-marked trend, with or without a superimposed 
^‘business cycle,'' it may be worth while to extrapolate for a year or two into 
the future. But the results should be interpreted very cautiously, as they 
imply an assumption about the continuance of the causes which have led to 




Sec. 14.2 


Moving Averages 


221 



the trend or the cycle, and, in the world of economics, conditions do not 
normally remain unchanged for very long. 

14.2 Moving Averages. One method of smoothing out irregularities in 
a series in order to exhibit the trend is that of moving averages. (If there are 
pronounced seasonal fluctuations in the data, these should be removed first, 
in a way which will be described later.) We suppose for convenience that the 
successive observations are made at equal intervals of time which we will take 
as the unit of time (usually a year or a month), and we call the successive 
values yo, yij ^ 2 , * • • If we need to go back in time before the instant we 
choose as our zero, we can write y-i, y-2t etc. For a simple moving average 
of 5 we take the mean of yo, yi, y%, y$, and 2 / 4 , and place it at x ~ 2 (the middle 
time), then the mean of 2/1, ^ 2 , 2/8, 2 / 4 , and y^ and place it at x = 3, and so on 
all along our series. Calling these means 2 / 2 , ps, * * * , w‘e have 


(14.1) 


5* “ (yo + + Vi)/^ “ 

ffi ** (yi + ys + yi + yi + y#)/5 « Sz/5 
54 =* (ya + ys + y 4 + yo + y 6)/5 S4/5 


In practioe it is usually quicker, especially with a calculating machine, to 
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g^t each successive sum from the previous one by adding one new term and 
subtracting one old one. Thus, 

Sz ^ S2 — Vo + yt 

Si Sz -- yi + yt 


Instead of a simple moving average, a weighted average is often used. The 
coefficients (or weights) are generally binomial. Thus, a weighted average 
of 5 would use the coefficients 1, 4, 6, 4, 1 which arise in the expansion of the 
binomial {q + pY- The formula for the successive terms of the sequence 
of averages is 

^ f 52 = (2/0 + 42/1 + 62/2 + 42/8 + t/4)/16 

(14.2) \ 

I yz = (2/1 + 42/2 + 62/8 + 42/4 + 2/5)/16 

Such an average is most easily computed by taking a binomial average of 
3 and another binomial average of 3 of the first set of averages. 

K 5 i « (2/0 + 2yi + 2/2)74 

52 = (2/1 + 22/2 + 2/8)74 

58 = (2/2 + 2yz 4 - 2/4)74 
and if 52 = (5i + 2^2 + 58)74 

58 *= (52 + 2^8 + 54)74 


it is easily seen that ^2 ~ (2/0 + ^yi + 6^2 + ^yz + yi)/l% etc. 

The use of the weighted rather than the simple average tends to produce a 
smoother curve while preserving the main features of the time series. An 
example of the computation for a weighted average of 5 for the sunspot data 
of Fig. 50 is given in Tabic 54, and the results are plotted in Fig. 50. The 
moving average has evidently smoothed out the random variations while 
preserving the general character of the oscillation. 

How many terms should be included in the moving average is a matter of 
judgment in each particular case. If it is desired to smooth out an oscilla- 
tion of regular period, the number of terms should exactly cover the period. 
The period of the sun-spot cycle is about 11.2 years, but is rather irregular. 
The effect of a simple moving average of 11 is shown in Fig. 50. For many 
commercial data (sales, etc.) there is a well-marked seasonal effect which is 
smoothed out by taking an average of 12 for monthly data. As a rule it is 
best to use an odd number of terms in a moving average so that the average 
can be attached to the time coordinate of the middle term. With the seasonal 
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Table 64 . Calculation or Binomiallt-weiohted Averaqb of 6 fob 

Sunspot Data 


Year 

(*,) 

Mean Sunspot 
Number (j/*) 

Sum of 3 

(2/i-i + 22/i + 2/i+i) 

Average of 3 

Sum of 3 

(5.-. + 25, + 5.+i) 

Average of 3 

m 

1/i 

1900 

9.5 





01 

2*7 

19.9 

5.0 



02 


37.1 

9 3 

47.6 

11.9 

03 

24.4 

95.8 

24 0 

100.3 

25.1 

04 

42 0 

171.9 

43.0 


41,4 

05 

63.5 

222.8 

55.7 

212.7 

53.2 

06 

53.8 

233.1 

58 3 

228.9 

57.2 

07 

62.0 

226.3 

56.6 

222.2 

55.5 

08 

48.5 

202.9 

50.7 

196 7 

49.2 

09 

43.9 

154.9 

38.7 

149.8 

37.5 

1910 

18 6 

86.8 

21.7 

90.5 

22.6 

11 

5.7 

33 6 1 

8 4 

42.1 

10.5 

12 

3.6 

14 3 

3 6 

19.6 

4 9 

13 

1.4 

16.0 

4 0 

28 6 

7.1 

14 

9.6 

68 0 j 

17.0 



15 

47.4 

i 





data the moving average can be centered on a month by taking an average 
of 2 of an average of 12, wliich will attach the final average to the 7th month 
of the original 12. 

More complicated moving averages are sometimes used by actuaries in 
smoothing long series. One such is Spencer’s 15“poii\t formula, with weights 
~3, -6, ~5, 3, 21, 46, 67, 74, 67, 46, 21, 3, -5, -6, -3. This can be 
obtained by taking a weighted average of 5 with weights —3, 3, 4, 3, --3, 
then a simple average of 5, and finally two simple averages of 4. 

One defect in a moving average of 2fc + 1 terms is that we lose k terms at 
the beginning and k at the end. This may be serious in comparatively short 
series and is an argument for using small values of k, 

14.3 The Slutzky-Yule Effect. If a moving average is used to determine 
trend, it will also have an effect on the genuinely oscillatory component (if 
any) of the time series. A long-p)eriod oscillation tends to be incluaed as 
part of the trend, whereas oscillations comparable in period with the length 
of the moving average or even shorter are damped out. But there is also an 
effect on the purely random component. It was proved by Slutzky and by 
Yule (independently) that a moving average may generate an irregular oscil- 
latory movement where none existed in the original data. This is the Slutzky- 
Yule effect. For a simple moving average of length fc, the vaiiance of the 
induced oscillation is 1/k times the variance of the random component, and 
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the average length of the oscillation is 360/^ where B is the angle (in degrees) 
in the first qxiadrant with cosine {k — l)/fc. The effect is increased when 
weighted averages are used. It is necessary in discussing the reality of 
possible oscillatory components in a time series to consider whether or not 
they may be spurious. 

14.4 Mathematical Trend Lines. A trend obtained by the method of 
moving averages, even though fairly smooth, is not, as a rule, conveniently 
representable by a mathematical equation. For interpolation and extra- 
polation the advantages of a mathematically expressed ti*end line are obvious. 
The attempt is therefore often made to fit the observed data with a fairly 
simple curve, and the simplest of all is the sti aight line. Even when a straight 
line clearly will not fit over a long interval of time it is sometimes possible to 
break the interval up into subintervals over each of which the trend is approxi- 
mately linear, and thus to “splice” two or more trend lines together. In 
other circumstances we may try to fit parabolic, cubic, or even higher poly- 
nomial curves, or use an exponential curve. Still more complicated curves 
are often used by actuaries in smoothing population and mortality statistics. 
We consider first the straight line trend. 

14.6 Linear Fimctions. We know from algebra that the general form of 
a linear equation in two variables is 

Ax + By ^ C 

where A, B and C are arbitrary constants. 

When B 0, the equation may be solved for y, giving y = — {A/B)z + 
C/B which is of the form 

(14.3) y ^ a + hx 

and which is the form we will ordinarily use to represent a straight line. 

The special cases where A or B or C m zero are as follows: 

When A «= 0, then y =* C/B, which is of the form y a. This is a line 
parallel to the x-axis. When 5 = 0, the equation takes the form x ^ Cj A 
which is a line paiallel to the y-&xm, Wlien C = 0, then Ax + By «* 0 
which is a line passing through the origin. 

The graph of (14.3) is a straight line (which explains the term “lineal*")- 
A characteristic property of a linear function is revealed at once by its graph. 
This is the fact that the ratio of a change in y to the corresponding change in 
X is cmstant. Thus, if two points (xi, yi) and (x 2 , ya) are chosen on the line, 
the value of the ratio 


j, « 

Xz — Xi 

is independent of the points chosen. This ratio gives the average rate of 



Sec. 14.6 Fittmg a Straight Line 225 

change of any function over the interval Ax = - xi. In the case of a 

linear function, b defines the rate of change of the function. 

Graphically, h is the slope of the line. 

If the units of x and y are identical and 
the scales are the same, b is the tangent 
of the angle of inclination 6 which the line 
makes with the positive o'-axis.* Lines 
having the same slope are parallel, and 
conversely. 

It is shown in analytic geometry that 
we may obtain the slope of a straight line 
from its equation if we solve for y and 
take the coefficient of x. Thus in 2x — y 
=» 5, 2 / “ 2x — 5 and the slope is 2. 

Conversely, if we know the slope of a line and the coordinates of any point 
on the line we can write its equation from the relation 

(14.4) y - yi ^ b{x - xi) 

which is called the point-slope form of a straight line. Thus, given that 
(2, — 1) is a point on a line whose slopt) is 2, the equation of the line is there- 
fore 2/ + 1 = 2{x — 2) 01 2j — 2 / = 5. 

Or again, remembering that h is defined by a ratio involving the coordinates 
of two points on a line, we can obtain the equation of a line if we know any 
two points which lie on it. From the definition of 6 and (14.4), we have 

(14.5) y - yi = ^ (x - Xi) 

X2 — Xi 

which is known as the two-point form of the equation. Thus, given that 
(2, —1) and (6, 7) are two points on a line, its equation is 

y + 1 = (x - 2) or 2x - y = 5 

14.6 Fitting a Straight Line. The preceding discussion is intended as a 
basis for the presentation of certain methods of fitting a line to data. The 
equation y = a -i- bx represents a family or set of lines corresponding to 
different values of the arbitrary constants a and 5. The process of finding 
the best fitting line for any given data consists in determining a and b. By 
''best fitting^^ we mean best under a criterion of appioximation specified by 
a method. We will consider three such methods: (a) graphical^ (b) the method 
of moments of ordinateSy (c) the method of least squares. 

* When the line is vertical, 0 •*' 90® and b does not exist. Then Ax •* 0 and division by 
«ero is excluded in our algebra. With time series the units are usually different for x and y 
and the inclination has no physical meaning. The slope is the important quantity. 
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M.7 Graphical Method. A e^traight line is drawn (preferably with the 
aid of a transparent ruler) to fit as closely as possible the plotted points. To 
find the equation of this line, select two points on the line and estimate their 
coordinates (xi, t/i) and {xo, ^ 2 ). Substituting these coordinates in the *^two- 
point^^ form of the line (14.5), we get the desired equation. 

If the first point is chosen so that Xi == 0 the numerical work of simplifying 
the equation is somewhat lessened. 

Example 1. Fit a line graphi<‘ally to tlie following data. 



X 

y 

0 

1 

66 7 

1 

72.7 

2 

62.3 

2 

92.1 

4 

93 0 

(1953) 5 

100.6 


We take the origin of x at 1948, hence from the figure fr, — 0, yi = 68) and ( 2:2 *= 5, 

y% « 101 ). 

By equation (3), 

101 - 08 

2/ — 08 X 

0 

Therefore, 

y 6.6x -f 68 

is the required equation. 

The graphical method is open to the objection that it depends upon the 
judgment of the investigator. Different people will locate the line in different 
positions and therefore obtain different equations. However, where only 
approximate results are needed it is usually quite satisfactory. 

14.8 Method of Moments. The constants a and h in the equation of 
a straight line fitted mathematically to a given time series are statistics 
calculated from the data. They may lie regarded as estimates of the param- 
eters a and P of the ^True'^ trend line 

( 14 . 6 ) y = a + px 

To avoid confusion we will use the symbol for the observed value of the 
variate at time and for the calculated trend value given by 

( 14 . 7 ) Yi = a + bx. 

There are two common methods of fitting a mathematical trend line. In 
the method of moments, the constants of the line are chosen so that the yi 
and the Yi have the same zeroth and first moments about the origin of x, 
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vbere the rth moment of yt ia defined by x/. That is to eay, for the 
straight line, * 

(14.8) Y.Vi » = E(a + hx,) 

it % 

and 

(14.9) YjVi^i = = 52 (a + hxi)xi 

t \ \ 

If the number of observations is iV, these equations may be written 

Z^VxXx = aZ^Xi + bZ^Xi^ 


and they are called the normal equations of the problem. They are a pair of 
simultaneous linear equations for the unknowns a and 6. 

Example 2. Find by the method of moments the best fitting line for the data in 
Example 1. 


X 

y 

xy 


0 

66.7 

0 

0 

1 

72.7 

72.7 

1 

2 

82.3 

164.6 

4 

3 

92.1 

276.3 

9 

4 

93.0 

372.0 

16 

6 

100.6 

503.0 

26 

15 

607.4 

1388.6 

55 


The normal equations are 
(14.11) 


6a H- 156 « 507.4 
15a + 556 = 1388.6 


These may be solved by eliminating a and obtaining an equation for 6. Thus, if we multiply 
the first equation by 5 and the second by 2, we get 


30a + 756 = 2537.0 
30a -f 1106 « 2777.2 


Subtracting the first of these from the second, we obtain 

366 « 240.2 

whence 6 6.86. The first of (14.11) then gives 6a « 607.4 102.9 — 404.6 so that 

a 67.4. The line is therefore 


Y - 67.4 -f 6.86® 
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Alternatively, equation (14.10) may be solved by determinants, 
formulas are: 


(14.12) 

(14.13) 

(14.14) 


h — 

N 

Hy 

o — 

Ex 

Exy 


Ej/ 

Ex 

a ~ 

Exj/ 

Ex^ 


N 

Ex 

where D = 

Ex 

Zx' 


. _ NZxy - (E^KZy) 

D 

. J. _ - (ZxHExy) 

D 

= NZx'^ - (Ex)^ 


The 


(The subscripts i have been dropped for convenience in printing.) It is 
understood, of course, that D is not zero. If it is, the equations for a and b 



(14.16) = - 

»~1 ( 


are either incompatible (with no solu- 
tions) or equivalent to each other 
(with indetenninate solutions). This 
is not likely to happen in practice. 

14.9 The Method of Least Squares. 
A second general met lux! of fitting 
a mathematical curve to a set of 
data is known as the method of least 
squares. It depends on the princi- 
ple that the ‘^best” fit is obtained 
w^hen the sum of squares of the 
differences d^ between the observed 
values 2 /t and the corresponding 
calculated values F* is as small as 
possible. (See Fig. 52.) That is, 

YtY = minimum 


The diflferences di are called residuals. In some circumstances the are 
weighted (see References 1 and 3 of the Introduction) ; but in fitting a straight 
line or polynomial to an ordinary time series the weights may all be taken as 
unity. For the straight line case, (14.16) becomes 

(14.16) ~ = minimum 

or, written out in full, with the subscripts dropped, 

(14.17) 2^2/* + — 2a^y — 2b^xy + 2abY^x = minimum 

Now the left side of (14.17) can be regarded as a quadratic expression in 6, 
of the form 


(14.18) 


Ab^ + 2R6 + C 
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where A = B =* ~ £®J/> and C = A^a* — 2a^y + X)y*- 

Since A is certainly not zero (we cannot fit a curve to one point), equation 

(14.18) may be written as 

^ + 2ABh + AC) [(Ab + B)^ + AC - B*] 

A A 

and this will have its minimum value for any choice of h when Ab + B 0^ 
that is, when 

(14.19) — Y^y = 

Again, (14.17) may be regarded as a quadratic in o, namely, 

(14.20) AV + 2B'a + C' 

where A' = iV, B' = b^x — and C' = b'^Y^^ ■“ ^^Y^V + Yv^- 

This has its minimum for any choice of a when A'a + B' = 0, that is, when 

(14.21) Na + bj^x - = 0 

These equations fl4,19) and (14.21) may be obtained more simply by the 
student who knows a little calculus, by differentiating the lefthand side of 
(14.17) partially with respect to both a and b and putting the derivatives 
equal to zero. In any case, it is evident that these are the same equations 
as we obtained before (equations (14.10)) by the method of niornents. It can 
be proved * that, for the fitting of any polynomial curve, the method of least 
squares and the method of moments lead to the same normal equations. 

The first equation of (14.10) can b(^ wTitten 

(14.22) Yid^ == 0 

80 that the sum of the residuals (taking accouAt of the signs) is zero. This 
property, combined with (14.15), is analogous to the similar properties of the 
arithmetic mean, namely, that the sum of deviations from the mean is zero 
and the sum of squares of deviations from the mean is less than from any 
other value. 

14.10 Fitting a Straight Line Through the Origin. If it is definitely 
known that the origin is a point on the curve, the line to be fitted is simply 

Y ^bx 


and the least squares condition is 


or 


T.(y — bxy « minimum 


( 14 . 23 ) 6*£x* - ^'^xy + J^y* = minimum 


• See Eeferenoe 2. 
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Applying the criterion above for a minimum, we have 

— "^xy = 0 
or 

b = '^xy/^x* 

14.11 Simplification of Calculations for Equispaced Data. If, as usual in 

time series, the observations are taken at equal intervals of time, the calcula- 
tion of the constants for fitted curves can be made much more easily. We 
first suppose that the number N of observations is odd (= 2k + 1). There 
is then a middle observation, the {k + l)th, which is the arithmetic mean 
of the Xu 

If the common time interval is c we can change to a new unit Uy given by 
- = (xt — x)/Cy and in the new units the times of the observations are 

— A;,* * *, —3, ~2, —1, 0, 1, 2, 3,* • •, k. Clearly ^Ut = 0, and ** 

2(P + 2* + • * * + = k(k + 1)(2A: + l)/3 == Z, say. The normal equa- 

tions (14.10), in terms of w, become 

« Na 

(14.24) = ft = 

whence 

a = (Z»)/W = P. 6 = (Zny)/l 

The equation of the line isY — a + bu — a + b(x — x)/c. 

Example 3. For the following data, c = 6, Z*=2-3 - 6/3 * 10, 2 «= 10. (Instead 

of using the formula for Z, we can work out the values of w* and add them.) 


Then o « 90/6 


X u y uy u* 


0 

-2 

12 

-24 

4 

5 

-1 

16 

-15 

1 

10 

0 

17 

0 

0 

16 

1 

22 

22 

1 

20 

2 

24 

48 

4 



90 

31 

10 

18, 

h « 31/10 « 3.1. 

The equation is 



F - 18 -f 3.1ii 

- 18 + ^ (* - 10) 
5 

-= 0.62X 4- 11.8 


If the number of observations is even ( = 2fc), there is no middle value of x, 
but the arithmetic mean 35 is midway between the two middle values. The 
values of u are fractional, but it is convenient to double them and compute 
and 4ti^. 



Exponential Trends 


231 


Sec. 14.12 

Example 4. The same data as in Example 3, but with an additional observation. 


X 

y 

2u 

2uy 

4w* 

0 

12 

-6 

-60 

25 

6 

15 

-3 

-45 

9 

10 

17 

-1 

-17 

1 

15 

22 

1 

22 

1 

20 

24 

3 

72 

9 

25 

30 

6 

150 

25 


120 


122 

70 

, a 

« 120/6 = 20, 

h - 61/17.5 

= 3.49. 

The hne is 


y - 20 -f 3.49w 
3.49 

« 20 H- — (a; - 12.5) 
5 

- 0.70a; -f 11.3 


For a long series a formula is useful for + 3^ + 5^ + • * •) for 

k terms. The formula is m ~ = iVCiV^ — 1)/12, where N = 2k. In 

Example 4, A: = 3, = 6, m = 35/2 = 17.5. 

14.12 Exponential Trends. When the given y values form approximately 
a geometric progression while the corresponding x values form an arithmetic 
progression, the relationship between the variables is given by an exponential 
function, and the best fitting curve is said to describe an exponential trend. 
Data from the fields of biology, banking, and economics frequently exhibit such 
a trend. Thus the growth of bacteria is exponential. Money accumulating 
at compound inteiest follows the same kind of law of growth. And in busi- 
ness, sales or earnings may grow exponentially over a short period. Another 
familiar example is the increase in friction as a rope is coiled around a post. 
As the number of coils increases in arithmetic progression, the pull which the 
rope will stand without slipping increases in geometric progression. 

The characteristic property of this la,w is that the rate of growth, that is, 
the rate of change of Y with respect to x, at any value of x, is proportional to 
the value of the function for that value of x. The function 

(14.25) Y = A > 0 

has this property.* The letter c is a fixed constant, usually either 10 or e, 
whereas A and B are statistics to be determined from the data. If Y de- 
creases as X increases, B is negative. An interesting example of this case is 
the disappearance as time goes on of radioactive substances like radium. 

To assume that the apparent law of growth will continue is usually unwar- 
ranted, so only short range predictions can be made with any considerable 

* The student of calculus will understand that **rate of change^’ is used here in the sense 

of derivative. For (14.25), dY/dx ^ kY. 



232 


Time Series 


xrv 


degree of reliability. When the exponential character of the observed i^e- 
notnenon ceases a saturation point is said to be reached. 

If we take logarithms (to base 10) of both sides of (14.25), we obtain 

(14.26) log F = log A + (jB log c)x 

If c = 10, log c = 1 ; if c = e, log c = 0.4343 approximately. In either case, 

(14.26) is of the form 

(14.27) F' = a 4- 5a: 
where 

(14.28) F' = log F, a = log A, 5 = J5 log c 

and is therefore the equation of a straight line in the coordinates F' and x. 
A method of fitting an exponential trend lino to a set of observed y’a is thus 
to fit a straight trend line to the logarithms oi the y’a. 

If we denote log y by y', we have 

■ b=^[Nj:xy'- (E^){Y:y')]/D 

(14.29) a = [(Zj/OCZa;^) - (Z=t^)(L^y')]/-D 
. D - Ai;x^ - (i:x)2 

Example 6. Find the exponential trend for the following data and draw the curve. 


X 

1 

2 

3 

4 

5 

y 

1.6 

4.5 

13.8 

40 2 

126.0 


Afl before, the work can be shortened by using a new variable, u « x — 3. The necessary 
computations are 


u 

y 

J/' - log V 

uy' 

tt* 

-2 

1.6 

0.2041 

-0.4082 

4 

-1 

4.6 

0.6632 

-0.6632 

1 

0 

13.8 

1.1399 

0 

0 

1 

40.2 

1.6042 

1.6042 

1 

2 

126.0 

2.0969 

4.1938 

4 



6.6983 

4.7366 

10 


h - Zvy'/Zw* - 0.4737 
o - Zv'/Af - 1.1397 
F' - 1.1397 + 0.4737(* - 8) 
(14.30) - 0.4737* - 0.2814 


Then 

Therefore 
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If we want the form (14.25) we must put A « antilog a « antilog (-0.2814) • 0.5231 ; 
B •“ 0.4737, if c ** 10, or *« 0.4737/0.4343 » 1.091, if c “ e. The equation of the ex- 
ponential trend is, therefore, 

Y « 0.5231 (10)o-<7r7. 

(14,31) = 0.5231ei'09»* 

For the purposes of plotting the curve, predicting, or interpolating, it is usuaUy best to 
calculate Y' from (14.30) and put Y « antilog F'. Thus the plotted points in Fig. 53 are 
obtained from the following table : 


1 

2 

3 

4 

5 

6 

0.1923 

0 6GC0 

1 1397 

1 6134 

2 0871 

2.5608 

1.56 

4 63 

13 79 

41.06 

122.2 

363.8 


It should be noted that applying 
the least squares criterion to the 
fitting of the straight line (14.27) 
is not quite the same thing as 
applying it to the fitting of the 
exponential curve (14.25). Our 
method gives greater weight to the 
smaller values of y. This may be 
a reasonable thing to do, particu- 
larly with economic data, the esti- 
mated error in such data being 
often roughly proportional to the 
magnitude of the quantity measured. 
If, however, we feel that in an ex- 
ponential time series the observa- 
tions should all be weighted equally, 
we can achieve this result approxi- 
mately by weighting the logarithms 
in proportion to y. The weighted 
least squares condition is * 



I 2 $ 4 » 

Fig. 53 


(14.32) == minimum 

where di = yi' ~ F/ = y/ - a - 

The corresponding normal equations are 

(14.33) a^y -f ~ 'Hyy'f £^ 2 / y' 

♦See Part Two, page 318. 
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14.13 The Compound Interest Law. Equation (14.25) is sometimes 
called the compotmd interest law because it describes the way money would 
grow if interest were compounded regularly. If P dollars are invested at 
a nominal rate lOOj % compounded m times a year, the amount S dollars after 
z years is given by the formula 




If j is compounded continuously or, in other words, if m is taken indefinitely 
large (written <»), the amount S does not increase indefinitely but 
approaches a limiting value. We may write the expression foi S in the form 

If we let N = m/j, we have 

/ lY 

It can be shown that, as iV «>, the quantity ( 1 + approaches the 
limit called c. Thus we have 

lim(l + = e = 2.718- •• 

fn-+oo\ N/ 


This limit is also the base of the Napierian, or natural, system of logarithms. 
As m — 4 a> so does W oo . Therefore in the ideal case of continuous con- 
version of interest, we have the limiting form 


S 


lim P 

m -♦00 



7* 


that is 


= lim P 

JV 



/S = Pe'* 


which is of the form (14.25). 

There are several other forms of the exponential function, 
if we let r = c*, (14.26) becomes 


For example, 


y = Ar* 


which is the general term of a geometric progression whose first term is A 
and common ratio is r. 
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If c «* «, F *» Ae^* = vilO*», where k ^ B logio c ** 0,4343 B approxi- 
mately. The factor 0,4343 is called the modulus of logarithms to base 10. 
The reciprocal of the modulus, 2.3026, is often useful since for any number N 

(14.34) loge N = 2.3026 logio N 

'V\Tien the symbol log is used without reference to base, we shall understand 
in the future that base 10 is meant. The symbol In is commonly used for a 
Napierian logarithm. 

14.14 Semi-logarithmic Graph Paper. In the graphical representation of 
data that exhibit an exponential trend, it is often desirable to use semi-logarith- 
mic paper. Such paper has a logarithmic scale in the vertical direction and 
a uniform scale in the horizontal direetion (Fig. 54). A logarithmic scale is 
one in which the distance from y I to y ^ N equals log iV. A “cycle^^ of 
rulings spaced according to the logarithms of the integers from 1 to 10 is the 
unit of the vertical log y scale. 



''Semi-log” paper may be constructed or purchased having one or more 
cycles. The appropriate number of cycles is determined by the range of y 
values in the -data to be plotted. If the bottom line of the first cycle is labeled 
1 and taken as the origin of log y (log 1 = 0), the beginning of the next cycle is 
read 10 (log 10 = 1), the next one above that is read 100 (log 100 ~ 2), etc. 
However, the beginning of the first cycle may be labeled with any number 
which is an integral power (positive or negative) of 10, as 0.01, 0.1, 10, 100, 
etc. Corresponding lines in successive cycles are labeled with numbers which 
are 10 times those in the preceding cycle. Since y has no real logarithm if 
y S 0, neither zero nor negative numbers are found on a logarithmic scale. 
Plotting a point whose semi-logarithmic coordinates are {x^y) is equivalent 
to plotting the point whose rectangular coordinates are (x, log y). 
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EmmpU d. Plot y ■« 8(2*) on semi-log paper. 

Solution, Assigning values to x we form the following table, 


X 

-3 ] 

-2 

-1 

0 

B| 

B 


4 

y 

1 

2 

4 

8 


■ 

64 

128 


from which we obtain the straight-line semi-logarithmic graph shown in Figure 64. 


Since (14.27) is linear in x and F', it is clear that (for ^ > 0) the graph of 
Y = on semi-logarithmic graph paper is bound to be straight. If, 
therefore, a time series is suspected to show an exponential trend, the simplest 
way to test this is to plot the points on semi-log paper and see whether they lie 
nearly on a straight line. 

After drawing (by eye) what seems to be the best-fitting straight line, we 
can estimate roughly the constants of the exponential curve from the coordi- 
nates of two selected points on this line. Thus, if the line goes through the 
points (xiy Yi) and (x 2 , y 2 ), the slope is given by 


72- - Y^' _ log (F^/Fi) 

X2 — Xi X2 ~ Xi 


B log c 


If we dhooee the points Xi and X 2 so that the interval corresponds to one cycle, 
F 2 /F 1 10, and therefore JS log c = — - — > or, when c = e, 

X 2 ““ Xi 

B = 2.3026/ (X 2 - xi) 

The value of A can estimated from the point where the straight line 
cuts X = 0. If this point is not on the graph, we can choose any convenient 
X, note the corresponding F, and calculate A = Fe”^^, from a table of 
(See Table 37, §10.8.) 

14.16 Ratio Charts. Graphs on semi-log paper are often called ratio 
charts. Their usefulness depends upon the property of logarithms that 

M 

log — == log — log N 


It follows that the distance between any two ordinates of the chart measures 
the ratio between the values represented by these ordinates. Thus if 


then 


yi Vt 

— BS — 

Vt V* 


or 


log vi — log yt = log i/» — log y. 

n - y. - y. - y. 
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that is, equal ratios are represented by equal vertical distances. Likewise, if 


then 


^ > L» 

y% Vi 


Ki - F* > 78 - F 


and the larger ratio is represented graphically by the larger distance. These 
differences of elevation are independent of any base line The same per- 
centage increase in y is represented by the same addition to the height of Y 
in all parts of the chart. Hence, it is easier to depict and discover percentage 
changes on ratio charts than on ordinary charts. 

The analysis of time series in economic statistics is often facilitated by 
forming “link rolativea^^ which are ratios of each ordinate (after the first) to 
the preceding ordinate. Thus, if 2/n ^ 2 * * *, Vn are the given values, the link 
relatives are 


Rt = 


y2 

yi 


R2 = 


y% 


Rn-l 


yn 

yn—1 


For any link relative, 100 (J? — 1 ) denotes the percentage change in y from one 
month (say) to the next. If the y^& are plotted on ratio pai-ier they will lie 
on a straight line when the R^s are equal, on a curve l^ending upward when 
tlie R^h are increasing, and on a curve liending downward when the 7i\s are 
decreasing. It follows that if tw'o curves are parallel on ratio paper their 
percentage rate of increase (or decrease) is the same. 


Table 55. Death Rates per 100,000 (U. S. A.) for Tuberculosis 
AND Typhoid Fever, 1900-1920 


Fear 

Tuberadosia 

Typhoid 

Year 

Tuberculosis 

Typhoid 

1900 

195.2 

31.3 

1911 

159.0 

16.3 

1901 

189.8 

27.5 

1912 

149.8 

13.2 

1902 

174.1 

26.3 

1913 

148.7 

12.6 

1903 

177.1 

24,6 

1914 

148.6 

10.8 

1904 

188.5 

23 9 

1915 

146.7 

9.2 

1906 

180.9 

22.4 

1916 

143.8 

8.8 

1906 

177.8 

22.0 

1917 

147.1 

8.1 

1907 

175.6 

20,5 

1918 

151.0 

7.0 

1908 

169.4 

19.6 

1919 

124.9 

4.8 

1909 

163.3 

17.2 

1920 

112.0 

6.0 

1910 

164.7 

18.0 





An example of the different impressions that may be given by plotting the 
same observations on ordinary and on semi-log paper is furnished by the data 
in Table 55, relating to the death rates in certain states of the U.SA. for 
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tuberculosis (all forms) and for typhoid fever. In Fig. 66 these data are 
plotted on ordinary graph paper and in Fig. 56 on semi-log graph paper. The 
decline in absolute value is greater for T.B. than for typhoid, as shown by 
the steeper slope in Fig. 55, but the relative decline (the percentage decrease) 



0 1905 1910 1915 1920 

Year 

Fig. 55. Death Rates from Tuberculosib and Typhoid Fever 


is greater for typhoid than for T.B. (about 80% as against 40%), as shown 
by the steeper slope in Fig. 56. The two graphs bring out different aspects 
of the same statistical data. 

14.16 Logarithmic Graph Paper. Coordinate paper on which the rulings 
in both directions are at distances from the origin proportional to the logarithms 
of the numbers represented is called logarithmic paper or log-log paper. It is 
not used very much for time series, but is mentioned here because of the 
similarity to semi-log paper, which is used a great deal. 

Its main purpose is to represent by a straight line power functions of the form 

(14.36) Y « A>0 
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Fig. 66. Death Rates from Thbercttlosts and Typhoid Fever 
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Since on log-log paper the abscissas and ordinates are proportional to 
and F', the giaph of (14.36) on such paper is a straight line. The constants 
a and h can be approximately evaluated from the slope of this line and its 
intercept on the Y axis. More exact values can be obtained by fitting the 
straight line (14.36) by least squares to the logarithms of the observed x 
and y values. 

Log-log paper is useful for graphing distributions which cover a very great 
range in both variables. A table of cumulative frequencies for persons in 
the U.S, reporting incomes in excess of a specified amount is of this nature. 
The x-variable (income) may range from $2000 up to a million or more, and 
the 2 /-variable (cumulative frequency) from a few tens up to a few millions. 
The data would be diflScult to plot except on log-log paper covering several 
cycles. 

This kind of graph paper is also frequently useful in engineering, where 
empirical relationships of the power function tyi)e often occur. The relation 
between pressure and volume of a gas in adiabatic expansion, p = kv~'^ , is of 
this type, and so is the formub for the flow of water over a rectangular weir 
of breadth B and height F, Q = 3.33 Both these relationships give 

straight lines on log-log paper. 

14.17 Other Types of Trend. The fitting of a parabolic trend line by 
least squares will be described in Chapter 16. Some other types of trend line 
occasionally required are : 

(a) the modified exponential curve, 

(b) the Gompertz curve, 

(c) the Makeham curve, 

(d) the logistic. 

Modified Exponential Curve. This curve has the equation 

(14.37) F = A + Be^^ = A + Bq\ q = e^ 

and the graph on semi-log paper is concave upward if A is positive and con- 
cave downward if A is negative. {B is assumed positive The curvatures 
are reversed if B is negative.) A graphical method of fitting, due to Cowden 
(Reference 3), consists in plotting the data on ordinary or semi-log graph 
paper, drawing a tentative trend line and selecting three equidistant ordinates, 
well apart, say at x — c, x, and x -f- c. If the corresponding ordinates are 
Fo, Fi and F 2 , we have from (14.37) 


(14.38) 


Fo « A -h 
Fi = A + Be*** 
F* « A + 
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Therefore 

(14.39) (F, - Fi)/(F, - Fo) = (e»« - 1)/(1 - e-^) = 

Also Fi - Fo = 

= (ep' — 1) 

From (14.39), e"' - 1 = ~ 

SO that = (Fi - Yoy/{Y 2 - 2Yx + Fo) 

Then, from the first equation of (14.38), 

(14.40) A = Fo ~ (Fi - Fo)V(i^2 - 2Fi + Fo) 

= (F 0 F 2 - Fi 2)/(F2 - 2Fi + Fo) 

Having estimated A in this way we plot values of ~ .4 (the are the 
observed values) on semi-log paper. If necessary we adjust A a little until a 
straight line fits reasonably well. Then the ordinate of this line at x = 0 
gives the value of B and the ratio of the ordinates at x and x — c gives e^^ = 
from which q can be found. If the values of — 4 are negative, the sign of 
the scale values on the semi-log paper must be changed. 

Gomperiz Curve. Type (b) is used in actuarial work and has had some 
application as a growth cui’ve m business and population forecasting. Its 
equation* is 

(14.41) F = aK 
or, in logarithmic form, 

(14.42) log F = log a + log b =. A + Bq* 

where 4 = log a, B «= log b. This 
is the same equation as (14.37) with 
log F instead of F. If g < 1, we see 
from (14.42) that F — > a as x . 

The line F = a is an asymptote to 
the curve, and a is sometimes called 
the ceiling of the curve. (Fig. 57.) 

The curve may be fitted by a modifi- 
cation of Cowden^s method described 
before, plotting log y^ instead of y,. 

Makeham Curve, Type (c) is 
also used in actuarial work. The equation is 

(14.43) F — k8*h^ 

•For a derivation see References 4 or 6. 
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or 

(14.44) log Y « log fc + log s + g* log b 

^ A + Bq* + Cx, C = log s 

It is a combination of a straight line with a Gompertz curve. For a discus- 
sion of its use in the field of insurance, see Reference 5. 

Logistic. This curve has been widely used in the study of growth. Its 
equation is 

(14.45) y « a/(l + 6g*) 
or 

(14.46) i = 

Y a a 

= + J5g» 

w^^ch is the same as (14.37) with 1/F instead of F. The same method of 
fitting will apply, with the reciprocals l/yt plotted instead of the yi. For 
convenience of computing trend values when the constants have been deter- 
mined, (14.46) may be written 

(14.47) log - a) = log B + X log q 

A logistic curve is a fairly good fit to the curve of population of the U.S.A. 
shown in Fig, 49, §14.1, Recent census figures, however, have not followed 
the logistic trend as closely as was hoped by Raymond Pearl and others who 
first advocated the use of this curve foi population studies. 

14.18 The Analysis of Business Time Series. Long series of monthly 
data, such as those on the mineial production of the United States over twenty 
or thirty years, or the sales of a large business concern over a similar time, 
may be analyzed into (a) a long-term trend, T; (b) oscillations or cycles, C; 
(c) seasonal fluctuations, S; (d) random irregular variations, I; each of these 
may be regarded as proportional to the preceding ones. That is to say, C 
fluctuates around the trend as a base, S fluctuates around a curve combining 
trend and cycles, while I fluctuates around the combined curve of trend, 
cycles, and seasonal variation, and the combined effect T + C + S + / may 
be written 

rp( T + C \ / T + C + ^ ( T + c + s + r \ 

\ T )\ T + C )\ T + C + S ) 

The second factor may be regarded as the cyclical effect, expressed as a frac- 
tion of T and fluctuating aroimd the value 1. We denote it by Ci, and siini- 
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larly the other factors may be denoted by Si and h. Then the time series is 
expressed by j/ = TCiSih^ and consequently 

log 2 / = log T + log Cl + log Si + log h 

This means that if the various components are plotted separately on semi- 
log paper ^ the ordinate of y on the same paper will be the sum of the ordinates 
of the separate components. 

The analysis of the time series into these components involves several steps : 

1. Estimating seasonal movements. 

2. Adjusting the data by dividing by the seasonal index Si. 

3. Computing the trend either by moving averages or by a mathematical 
equation. 

4. Adjusting for trend by dividing by T. 

5. Smoothing out the irregularities by a short moving average. This 
leaves only the cyclical movements. 

14.19 Deflating and Deseasonalizing Data. Before attempting to analyze 
a series expressing business activity it is often advisable to see that the figures 
are as comparable with each other as possible. For example, if one is inter- 
ested in the variation in volume of sales of a commodity, the dollar value oi 
the sales might not be a true indication of this variation lx)cause of the change 
in price through the period studied. Hence the figures in a value series are 
often divided each by an appropriate price index number to obtain a com- 
parable quantity series, representing volume (of sales, production, etc.). 
This process is known as deflating the series. 

Another difficulty is caused by the irregularities of the calendar, in parti- 
cular by the different lengths of the months. Monthly sales figures may be 
rendered more comparable by multiplying them by a factor which is the ratio 
of the average number of days in a month to the actual number in a particular 
month. Production figures for an industry may be adjusted by a factor 
based on the number of working days in the month. We suppose adjust- 
ments such as these carried out before the analysis of tlie time series begins. 
The first step is to smooth out the seasonal movements by a 12-month moving 
average (usually followed by a 2-mLonth moving average, so as to center the 
final average on a calendar month). The resulting curve is an estimate of 
the trend and the cyclical movements combined (PC). The original data 
are divided by these TC estimates, giving estimates of SI (seasonal and 
irregular movements), corrected for the effects of trend and cycles, and ex- 
pressed as percentages of the centered 12-month moving average. 

An illustration of this process is furnished by the fictitious data of Table 
66, relating to sales of the XYZ Products Company. The monthly figures 
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are in column (2), the centered 12-month moving average in column (5), 
and the seasonal and irregular movements, as a percentage of the moving 
average, in column (6). Columns (2) and (5) are graphed in Fig. 58 and 
column (6) in P'ig. 59. 



A seasonal index is then formed by averaging all the January values, all 
the February values, and so on, in column (6), and adjusting the results by 
a suitable factor so that the twelve averages themselves average exactly 100 
per cent. For brevity, Table 56 has been confined to 3 years, and there are 
only two values for each month in column (6), but in practice there would 
probably be a series covering 15 or 20 years to work on. However, to illus- 
trate the method we will compute the seasonal index from the limited data 
given. 
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Table 56. Monthly Salbb of XYZ Pboductb Co. 


(1) 

Date 

(2) 

Sales 

(thousands 
of dollars) 

(3) 

12-mo. moving 
total 

1949 J 

8639 


F 

3591 


M 

3326 


A 

3469 


My 

3321 


J 

3320 

41 424 

Jy 

3205 

41 698 

A 

3205 

41 963 

S 

3255 

42 351 

0 

3550 

42 702 

N 

3771 

43 028 

D 

3772 

43 206 

1950 J 

3913 

43 477 

F 1 

3856 

43 626 

M 

3714 

43 965 

A 

3820 

44 245 

My 

3647 

44 657 

J 

3498 

45 367 

Jy 

3476 

45 847 

A 

3354 

46 521 

s 

3594 

47 094 

0 

3830 

47 679 

N 

4183 

48 056 

D 

4482 

48 550 


(4) 

2-mo. moving 
total 

(6) 

Centered 
12-mo. mov- 
ing average 
((4) + 24) 

(6) 

Estimated 

SI 

(100(2) + (6)) 

83 122 

3463 

92.5 

83 661 

3486 

91.9 

84 314 

3513 

92.7 

85 053 

3544 

100.2 

85 730 

3572 

105.6 

86 234 

3593 

105.0 

86 683 

3612 

108.3 

87 103 

3629 

106.3 

87 591 

3650 

101.8 

88 210 

3675 

103.9 

88 902 

3704 

98.5 

90 024 

3751 

93.3 

91 214 

3801 

91.4 

92 368 

3849 

87.1 

93 615 

3901 

92.1 

94 773 

3949 

97.0 

95 735 

3989 

104.9 

96 606 

4025 

111.4 




246 


Time Series 


XIV 


Tablu 66. Monthlt Salks op X,YZ pROPtjcrs Co. — Continued 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 


Sales 

12-mo. moving 

2-mo. moving 

Centered 

Estimated 

Date 

(thousands 

total 

total 

r2~mo. mov- 

SI 


of dollars) 



ing average 
((4) + 24) 

(100(2) (6)) 

1951 J 

4393 

48 869 

97 419 

4059 

108.2 

F 

4630 

49 007 

97 876 

4078 

111.1 

M 

4287 

48 984 

97 991 

4083 

106.0 

A 

4405 

49 077 

98 061 

4086 

107.8 

My 

4024 

48 878 

97 955 

4081 

98.6 

J 

3992 

48 276 

97 164 

4048 

98.6 

jy 

3795 





A 

3492 





s 

3671 





0 

3923 





N 

3984 





D 

3880 





J 

F M 

A My 

J Jy A 

S 0 

N D 


108.3 106.3 101.8 103.9 98 5 93 3 92 6 91.9 92.7 100.2 105.6 106.0 


108.2 111.1 105.0 107 8 98.6 98.6 91.4 87.1 92.1 97.0 104.9 111.4 


Av. 108.2 108.7 103.4 105 8 98.6 96.0 92.0 89.5 92.4 98.6 105.2 108.2 

If one or two out of a dozen or more monthly values differ markedly from 
the rest, they may be excluded in forming the average, £is the purpose of this 
average is to get a typical representative number for the month. 

The total of the averages in our example is 1206.6 instead of 1200, so each 
of them is multiplied by the factor 1200/1206.6 = 0.9945. The final seasonal 
index is 

J FMAMyJJyASOND 
107.6 108.1 102.8 105 2 98.1 96.5 91.5 89.0 91.9 98.1 104.6 107.6 


The original data are deseaaonalized by dividing each month’s figure by 
the seasonal index for that month. 
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Jf fmamjjasondj fmamjjasondjfmamjjasonds 
1W9 1960 1961 

Date 


Fig, 59, Seasonal and Irregular Components 


14.20 Elimination of Trend and Irregularities. The type of trend curve 
to be fitted depends on the appearance of the data as plotted, and no rules 
can be given. If possible, one that can be expressed by a mathematical 
equation should be used, and, unless there are theoretical or logical reasons 
for preferring a certain equation, the simpler the better. By way of illustra- 
tion we have fitted two straight lines to the deseasonalized data of Table 66. 
The computations are shown in Table 56a. When the data are divided by 
the trend values, the result is the cyclical and irregular component only (C/). 
The irregularities may be smoothed out by taking a short moving average, 
such as a binomially weighted average of 3. The cyclical components then 
remain, and their interpretation (with real data) constitute the task of the 
economist. In Fig. 60 the ^ht of the straight trend lines to the deseason- 
alized data is indicated. The cyclical and irregular components may be 
plotted separately, but their general appearance can be estimated from the 
figure. 
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Tablb 66a. Fit of Two Straight Trend Lines to Dbbeasonalizbd Data 


Date 

Deseasonalized 
Sales iy) 
{thouscends of 
dollars) 

u 

uy 

F «• a -h 

(1) 

(2) 

(1) 

(2) 

(1) 

(2) 

1949 J 

3382 

-13 


-43 966 


3260 


F 

3322 

-12 


-39 864 


3280 


M 

3236 

-11 


-35 686 


3316 


A 

3298 

-10 


-32 980 


3348 


My 

3386 

-9 


-30 466 


3380 


J 

3476 

-8 


-27 808 


3413 


Jy 

3603 

-7 


-24 621 


8446 


A 

3601 

-6 


-21 606 


3478 


S 

3542 

-6 


-17 710 


3611 


0 

3619 

-4 


-14 476 


3644 


N 

3606 

-3 


-10 816 


3676 


D 

8606 

-2 


- 7 012 


3609 


I960 J 

3637 

-1 


- 3 637 


3642 

1 

F 

3667 

0 


0 


3674 


M 

3613 

1 


3 613 


3707 


A 

3631 

2 


7 262 


3740 


My I 

3718 

3 


11 154 


3772 


J 

3663 

4 


14 652 


3806 


Jy 

3799 

6 


18 996 


8838 


A 

3769 

6 


1 22 614 


3870 


s 

3911 

7 


27 377 


8903 


0 

3904 

8 


31 232 


3936 


N 

3999 

9 


36 991 


3968 


D 

4165 

10 


41 650 


4001 


1961 J 

4083 

11 


44 913 


4034 


F 

4191 

12 

-5 

50 292 

-20 955 

4066 

4274 

M 

4170 

13 

-4 

54 210 

j -16 680 

4099 

4223 

A 

4187 


-3 


-12 661 


4172 

My 

4102 


1 -2 


- 8 204 


4121 

J 

4180 


-1 


- 4 180 


1 4069 

Jy 

4148 


0 


0 


4018 

A 

3924 


1 


3 924 


3967 

s 

3886 


2 


7 772 


3916 

0 

3999 


3 


11 997 


3866 

N 

3809 


4 


16 236 


3814 

D 

3606 


6 


18 030 

i 


3763 
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1949 1950 1951 

Fiq. 60. Deseasonalized Sales Data and Trend Lines 


Exercises 

1. (Wilson and Tracy) The premium (%y) on a SIOOO life insurance pob’cy for various 
ages (x years) is given in the following table. Draw a graph exhibiting ?/ as a function of z. 
Estimate from the graph the premium at age 32 and at age 43 ; also the age at which the 
premium is $52. 


X 

20 

25 


35 

40 

45 


55 

60 

y 

18.78 

21.02 

23.86 

27.54 

32.36 

38.83 

47.68 

59.88 

76.94 


Find the equation of each of the straight lines through two points given as follows: 
(a) (2,6), (4, 6);(b) (0, 3), (1,6). 

8. Find the equation of a line through the point (2, 3) and parallel to the line 
+ 6i/ « 7. 
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4. Find the value of x for which /(x) »2a5* — &i? + 9ha8a minimum value. What is 
this minimum? 

Hint. Use the method of §14.9. 

5. Fit a straight line to the following data: 


2 6 

7 

8 

9 

10 

11 

12 

13 

y 7 

7 

6 

4 

4 

3 

2 

1 


Ans. r « -0.902 + 12.8. 

6. Calculate the values of V for each value of 2 in Exercise 6, obtain the values of the 
deviations d,, and check that ^dt — 0. 

7. Calculate in Ex. 6. Another line for which = 0 is F =» —2 + 13.76. 
Check that J^dt^ for this line is greater than for V == — O.OOx + 12.8. 

8. The uniform horizontal scale on a sheet of semi-log paper ranges from 0 to 10. The 

vertical logarithmic scale ranges from 100 to 1000. A straight line is drawn on the paper 
from the upper end point of the vertical scale to the mid-point of the horizontal scale. Find 
the exponential function represented by the line. What is the equation of the line in 
(V\ x) coordinates, where K' ~ logio F? Ans. Y ~ 1000 c F' = 3 — 0.22. 

9. A straight line is drawn on logarithmic graph paper through the points (4, 16) and 

(6, 64). Find the function represented by this line. Ans. y » 2*/4. 

10. Draw the graph of ~ 10 on semi-log paper. 

11. In the following table y represents the fire losses in the United States for the years 
mentioned (in millions of dollars). Find the best fitting straight line (in the least squares 
sense) for the data. 


2 

1915 

1917 

1919 

1921 

1923 

1925 

V 

172 

290 

321 

495 

535 

570 


12. Add the pair of values 2 = 6, y = 300 to the data of Example 6, §14.12, and find the 
equation of the best fitting exponential curve. Ans, F' == 0.46172 — 0.2534, 

F = 0.56e'««*. 

18. Sketch the curves y = 10 e"' and y == 10 e"**'*, for 0 <2 < 3. 

14, A straight hne is drawn on semi-log paper through the points (2, 1) and (4, 100). 
What function has this line for its graph? Hint. Put F « Ar®. Ans. 100 F ■■ 10*. 

16. Data from a certain experiment involving voltage (v) as a function of time (t) are 
plotted on logarithmic coordinate paper, and are found to exhibit a linear trend there. 
A line is drawn, with a transparent ruler, which seems to fit the plotted data best. Two 
points on this line are (6, 18> and (8, 32). Determine an equation expressing v in terms of ( 
whose logarithmic graph is the hne. 

16. Draw the graph of y « 252" on logarithmic coordinate paper, (a) when n »* 2, 
(b) when n —2. Mark scales clearly. 

17. Fit a logistic curve to the population figures of the United States, 1790-1950, using 
the method described in §14.17. The approximate figures, in millions, are: 

X 1790 1800 1810 1820 1830 1840 1850 1860 1870 1880 

V 3.93 6.31 7.24 9.64 12.87 17.07 23.19 31,44 39.82 60.16 


X 


1890 1900 1910 1920 1930 1940 1960 

62.96 76.00 91.97 106.71 122.78 131.67 160.70 
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CHAPTER XV 

LINEAR REGRESSION AND CORRELATION 

16.1 Bivariate Data. Until now we have been concerned with problems 
of variation in a single variable quantity, or (in Chapter XIV) with a variable 
quantity depending on the time. We shall now consider the simultaneous 
variation of two variable quantities. The methods of expressing the relation- 
ship between two variables are due mainly to the English biometricians Sir 
Francis Galton (1822-1911) and Karl Pearson (1857-1936). 

Data presenting two sets of related measurements or observations may 
arise in many fields of activity yielding N pairs of corresponding observations 
y%)y i == Ij 2, 3,- • V. Thus X may represent July rainfall and y the 
average yield of corn in a certain section ; x may be an index of commodity 
prices and y an index of employment over the same period; we may be inter- 
ested in a group of school children in which x is their height and y their weight, 
or X may refer to their reading ability and y to their spelling ability; we may 
be studying the chance distributions which are obtained in throwing two dice 
where x is the number obtained in throws of a single die and y is the number 
obtained in throws of the two dice together. 

Example 1. In the following set of selected heights (inches), x « stature of father, 
y « stature of son. 


x 

69 

70 

69 

68 

70 

73 

69 

67 

69 

64 

y 

68 

69 

72 

67 

70 

71 

72 

66 

71 

65 


Example 2. (Snedecor) The following data on twelve trees are adapted from the results 
of an experiment to test the phenomenon that the injury by codling moth larvae seems to 
be greatest on apple trees bearing a small crop. Here x - hundreds of fruit on a tree, 
y « percentage of fruits wormy. 


X 

15 

15 

12 

26 

18 

12 

8 

38 

26 

19 

29 

22 

y 

62 

46 

38 

37 

37 

37 

34 

25 

22 

22 

20 

14 


When the given pairs of values (a:,-, yi) are plotted on ordinary graph 
paper, we obtain a ^^dot diagram^' or “scatter diagram.” Fig. 61 shows the 
scatter diagram for the data of Example 2. There are two main problems 
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involved in the relationship between x and y. The first is to find the most 
suitable form of equation for u^e in predicting the average y for a given x or 
in predicting the average x for a given y, and to estimate the error in such 
predictions. This is the problem of regression. The second is to find a 
measure of the degree of association, or correlation as it is called, between the 
values of x and those of y. The two problems are closely related. 


V 



Fig. 61 

The field of relationship may be thought of as bounded on the one extreme 
by perfect functional dependence and on the other extreme by complete inde- 
pendence in the piobability sense. For example, the pairs of values which 
satisfy the equation =« 2x — 5 do not present a statistical problem. In 
this case the relation^p is defined by a mathematical function y « /(x). 
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Similarly, at the other extreme we would not be concerned with pairs of values 
which are completely independent in the probability sense, as, for example, 
the grades of students in statistics and the heights of their fathers. IVo 
variables are said to be statistically related when they lie between these two 
extremes of relationship. 

16.2 Regression. K we fit a straight line by least squares to the dots of 
the scatter diagram in such a way as to minimize the sum of the squares of 
the distances parallel to the y axis from the dots to the line, we obtain the 
regression line of y onx. As we saw in §14.9 this line has the equation 

(15.1) F = a + 6x 
where a and b are given by 

(15.2) Na + J^xb = '^y 

^xa + '^x^b = 


We cannot simplify these equations as we did for the trend line fitted to a 
time series, because the x values are not usually equally spaced. The same 
general solution holds, however, namely: 


(16.3) 

(15.4) 


^ ~ JVZx’ - {Hxy 


The quantity 6, which is the slope of the regression line, is usually called the 
regression coefficient. 

Now, if s/ is the variance of the N values x*, we have 

(15.6) NsJ^ « 3* « (Z^)/N 


and in a similar way we define a quantity Sxy, called the covariance of the N 
pairs of values by the relationship 

(15.6) Ns,, 'Eix - I)iy - 


Unlike the variance, this quantity may be positive or negative. Just as 
we may write 


NsJ = E®* - (Z^y/N 


(16.7) 

BO W6 have 

(16.8) 


Ns,, - (^)(^y)/N 
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which ie easily proved to be equivalent to (15.6). With this notation, (15.8) 
and (15«4) becomo 

(15*9) b = Sgy/aJ, a « Jf — 

so that the equation of the regression line is 

(15.10) y - ^ ^ (i - s) 

The line, therefore, passes through the point whose coordinates are (2, J?). 

The term regression was used first by Galton in studying inheritance of 
stature. He found that offspring of abnormally tall or short parents tend to 
“step back^^ or “regress^* to the ordinary population height. However, as 
now used, regression line has no reference to biometry, but is merely a con- 
venient term. 

For the actual calculation of the constants a and 6, with a set of discrete 
data, it is generally best to use (15.3) and (15.4). The computation for the 
data of Example 2 is set out in Table 57. 


TABiiB 57. Calculation of Variances and Covariance for Data of Example 2 


X 

y 

x* 



15 

52 

225 

2704 

780 

15 

46 

225 

2116 

690 

12 

38 

144 

1444 

456 

26 

37 

676 

1369 

962 

18 

37 

324 

1369 

666 

12 

37 

144 

1369 

444 

8 

34 

64 

1156 

272 

88 

25 

1444 

625 

950 

26 

22 

676 

484 

572 

19 

22 

361 

484 

418 

29 

20 

841 

400 

580 

22 

14 

484 

196 

308 

240 

384 

5708 

13716 

7098 


We have 


12(7098) - 

(240) (384) 

-6984 

“ 12(5708) - 

(240)* 

10896 


« — 0.641 

a « [384 - 240 (- 0.6410) ]/12 « 44.82 


so that the line is 
(16.11) 


Y « 44.82 - 0.641X 
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This line is marked (1) in Fig. 61. It is the appropriate line to use if we 
wish to estimate y for a given value of x. The x values are considered to be 
without appreciable error and the y values to vary in a random manner around 
the regression line. 

A second regression line, that of x on y, may be fitted so as to minimize 
the sum of squares of the horizontal distances (parallel to the x axis) from the 
points to the line. This line has the equation 

(15.12) X-x^^(y-y) 

Sy 

and so, like the other, passes through the point (5, y). Its equation may 
be written 

(15.13) X ^ a' + b'y 
where 

(15.14) b' = {NZxy - i:xj:y)/{N'£,f - (ZvY) = sM 

(15.15) a' = (Zx - b’Zy)/N 

and this form is convenient for calculation. From the data of Table 57 we find 

b' - 112(7098) - 240(384)1/[12(13716) - (384)^] 

= -6984/17136 = -0.4076 
a' = [240 - 384 (- 0.4076)1/12 = 33.04 
X = 33.04 - 0.40762/ 

This line, marked (2) in Fig. 61, is the appropriate regression line to use in 
estimating xfor a given value of y, that i^, when the y values may be considered 
free from error and the x values as scattering about the regression line. Since, 
however, we are free to choose which of our variables shall be called x and 
which 2/, and since there is generally one variable which it is reasonable to 
regard as dependent on the other, it is not necessary to use both regression lines. 

16.3 Coefficient of Correlation. There is said to be positive correlation 
between x and y if, for an assigned x greater than x, the corresponding y values 
tend to be greater than y, and if, for x less than the corresponding y values 
tend to be less than y. The correlation is negative if, for x > ^, 2/ tends to be 
less than y and if, for x <x,y tends to be greater than y. This definition 
depends on the assumption that the observed pairs of values (x*-, yi) form a 
sample from an indefinitely large population in which y is a random variable 
having a probability distribution depending on x. If the sample is large, the 
obseiwed 2/» will give a good idea of this probability distribution. 
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When the variables are correlated there is a tendency for the dots in the 
scatter diagram to fall into a sort of band having a fairly definite trend. We 
are assuming that this trend is linear, and a theory built upon this assumption 
is known as simple or linear correlation. 



In Fig. 62 the origin of the a:'y'-axes is taken at (x, p). Then the points 
of the scatter diagram are distributed over the four quadrants of the x'^'-plane. 
The coordinates of the points in the four quadrants have algebraic signs as 
follows. In quadrant 

I, x' and 2 /' are positive; 

II, X' is negative and y' is positive; 

III, X' and y' are negative; 

IV, X' is positive and y' is negative. 

Therefore, the product x'y' is positive for all dots which occur in quadrants 
I and III and negative for all dots in quadrants II and IV. The algebraic 
sum of all such products describes the distribution of the dots over the quad- 
rants. When this sum is positive the trend of the dots is through quadrants 
III and I; when it is negative the trend is through II and IV ; and when zero 
there is no trend, the dots being equally distributed over the four quadrants 
in the sense that the positive products of x'y* balance the negative products. 
Consequently, a natural measure of correlation for the sample would be 
obtain^ by summing the products x'y' for all the observed values and taking 
the average by dividing the result by N. Moreover, if we first express x? 
and y* in units of their respective standard deviations we obtain a measure of 
correlation which is independent of the original units. This measure is 
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uoivemally denoted by r. Thus we have in B 3 anbol 8 , 

1 w / 

(15.16) 

iV.tlV S, )\ Sy ) 

It is called the produd-moment coefficient of correlation or Pearson ^s correlation 
coefficient. 

From the definition of covariance, (15.8), it is clear that 
(15*17) T ^xyj 

= cov (x, j/)/[var ( 2 :)-var (y)]^ 



If the standardized variate (x ~ x)/sx is denoted by z and the standardized 
variate {y — y)/Sy by w;, then r is simply the covariance of z and w, that is, 
the arithmetic mean of the products z^w^. 

For the purpose of calculation with ungrouped variates, the most con- 
venient formula for r is 


(15.18) 


Nj^xy - Y^xYjV 

livEy* - 


The calculation may frequently be simplified by making use of the following 
theorem. 


Theorem 1. The value of r is independent of the origin of reference and the 
units of measurernent 

Proof: Let 

X - aio y — yo 

y, s=s f *=* 

h k 

Then 

X uh + Xo, y = vk yo, «» = hs^, «» = ks. 


Since x — S — (u — U)h and y — y = (v — V)k, we have, on substituting 
in (15.16), 


1 IT-' / 

JV . V 8, A 8. ) 


(15.19) 


ijvEm* - (E«)*i^t^E«^ - (E®)*]^ 


which is exactly the same formula as (15.18), with u and v instead of x and y. 
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Etam’ph 3, To illustrate the formulas we will compute the value of r for the followiiig 
data. Here « « Brokers' Loans in billions of dollars and y » The AnnaliaVe index of 
the prices of fifty rail and industrial stocks. We choose u » a ; - 6.00 and r • y — 260, 
A - A? « L 


Month 

X 

y 

u 

V 

uv 

w * 


J 

6.33 

248 

0 33 

-2 

-0.66 

0.1089 

4 

F 

6.67 

248 

.67 

-2 

-1.34 

.4480 1 

4 

M 

6.66 

243 

.65 

-7 

-4 66 

.4225 i 

49 

A 

6.66 

249 

.66 

-1 

-.56 

.3136 1 

1 

My 

6.53 

236 

.53 

-16 

-7.96 

.2809 

225 

J 

6.28 

266 

.28 

15 

4.20 

.0784 

226 

Jy 

5.77 

282 

.77 

32 

24.64 

.6929 

1024 

A 

6.02 

303 

1.02 

53 

54.06 

1.0404 

2809 

8 

6.36 

290 

1.35 

40 

54.00 

1.8225 

1600 

0 

6.80 

230 

1.80 

-20 

-36.00 

3.2400 

400 

N 

4.88 

201 

-.12 

-49 

6.88 

.0144 

2401 

D 

3.46 

206 

-1.55 

-44 

68.20 

2.4025 

1936 

Sums 


6.29 

0 

159.92 

10.7659 

10678 


Computations: 

~ ( Ew )* « 89.6267 
12 E *^ - ( E «»)* “ 128136 
\2'Zuo - (Ew)(E*') “ 

1919.04 1919.04 ^ 

’’ “ (89.6267 X 128136)^ ” 3388.9 

In large-scale computations the use of a calculating machine is almost 
essential. Students interested in such work should consult Reference 1. 

A subscript notation is attached to r when there are several variates, thus, 
for the correlation between x and y, r,, for that between x and z, ru for the 
correlation between xi and x^y etc. 

16.4 Relation between Coefficients of Regression and Correlation. From 
equations (16.9), (16.14), and (15.17) we see that 

(15.20) 7^ = bb' 

but in computing r from this relation we must give it the sign of b and 5' 
(both of which have the same sign as Say). For the data of Example 2, illus- 
trated in Fig. 61, b « -0.641, 6' « -0.408, r = - (0.261)'/* - -0.61. The 
following quotation from Snedecor (Reference 2) throws light on the dis- 
tinction between regression and correlation; 
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In other words, r is the geometric mean of the two regression coefficients. * * * This serves 
to clarify the relation of the two coefficients, correlation and regression, in measuring r^a- 
tionship. The latter is the appropriate one if one variable, y, may be designated as depend- 
ent on the other, %, Values of y may be partly controlled or caused by a;, as when the 
available amounts of some glandular secretion cause differences in the sizes of organisms. 
Or, y may be subsequent to x, as weight gain in nutrition experiments follows the measure- 
ment of initial weight. Tn such cases, the regression of y on x is usually the statistic that 
furnishes the information dtssLred. It is then appropriate to attempt to estimate the value 
of y from a knowledge of the corresponding value of x. Correlation, on the other hand, is 
the appropriate measure of the relation between two variates like statures of sister and 
brother. The two heights are known to be associated through the complex mechanism of 
inheritance, but neither may be looked upon as a consequence of the other. In this sense 
corr^ation is a two-way average of relationship, while regression is directional. Of course, 
there are many variates whose relationship may be studied by means of either correlation 
or regression, or both. It is necestary only to keep clearly in mind the character of the 
relation being considered. 

The two regression lines may be written 


(16.21) 


Y-y 


r-S'Cx-J) 


(15.22) 




or, in terms of the standardized variates, z and 

(16.23) W ^rz 

(15.24) Z^Tw 

The coefficient of correlation is therefore the slope of the regression line of 
w on z, and its reciprocal is the slope of the regression line of z on w (with equal 

scales for z and w, the second regression 
line makes with the w axis the same 
angle that the first regression line makes 
with the z axis, namely, the angle whose 
tangent is r). Both lines go through the 
origin (see Fig. 63, which is drawn for 
the case r > 0). 

The larger the value of r, numerically, 
the smaller the angle between the two 
regression lines, and the narrower the 
band of dots in the scatter diagram. As 
we shall see, r is always between —1 
and 1. When r « 1 or — 1, the two re- 
gression lines coincide. When r 0, 
they cut at right angles. 

16.6 Interpretation of the Coefficient of Correlation. It must be empha- 
sized that only when the trend of dots in the scatter diagram is a straii^t line 
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cm r be regarded as a useful measure of the degree of association between 
X and y. If the trend is straight, a value of r near 0 means very little associ- 
ation between x and y, that is, x and y are practically independent variates, 
and if r is near 1 or — 1, y is highly dependent on x (or x on y). However, if 
the trend line is curved, it is possible 
for r to be very nearly zero and yet y 
to be highly dependent on x, as in 
Fig. 64 (see Reference 3). A differ- 
ent measure of association is appro- 
priate in such cases and will be dis- 
cussed in Chapter XVI. 

If X and y are independent^ the co- 
eflSicient of correlation r is zero, but 
if r a= 0, X and y are not necessarily 
independent. They are merely un- 
correlated. Incidentally, the phrase Fig. 64. r « 0 

^^independent variables” in the statis- 
tical sense should not be confused with the phrase “independent variables” 
which is used in the ordinary sense of analysis to designate the variables on 
which a specified function depends. However, the two usages, though quite 
distinct, are not fundamentally contradictory, since functional dependence 
can be regarded as a limiting case of statistical dependence. 

The data should be reasonably homogeneous. If the dots in the scatter 
diagram show a tendency to cluster m two or more groups, a spuriously high 
value of r may result, due merely to the heterogeneity of the data. Thus in 
Fig. 65, the correlation for the two groups of values taken together would bo 



quite high, whereas each group 
alone would give a correlation 
coefficient near zero. If, on exam- 
ining the data, some reasonable 
basis can be found for separating 
these two groups, it is probably 
best to compute )a separate co- 
efficient for each. 

An observed positive (or nega- 
tive) value of r in a sample is 
not necessarily an indication that 
the true correlation in the popu- 
lation is different from zero. We 



Fig. 66 


shall discuss the question of the 

significance of r in a later section and merely remark here that the smaller the 


sample, the less the significance of a given value of r. For a class of 19 stu- 
daits, a coefficient of 0.34 was found between their final marks in mathematics 
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and the initial letters of their surnames (-4 being given the value 1, B the 
value 2, and so on) . Since it seems very unlikely that these variates are other 
than independent, the observed r must be purely a sampling fluctuation from 
a true value of zero. 

Even if x and y are independent of each other, it may be that both vary 
roughly in the same way (or in opposite ways) with time, and hence, if obser- 
vations are spread over a long interval of time, a spurious correlation may be 
found. Yule observed a very high correlation (r = 0.951) between the pro- 
portion of marriages celebrated in Anglican churches (in England and Wales) 
from 1866 to 1911 and the standardized mortality rate over the same period. 
The reason is that both these variables show a well-marked downward trend 
over the period. If the trend is eliminated and only the residuals are corre- 
lated we find, as we should expect, a value which is too low to be significant 
(0.19 in fact). Yule called correlations such as this nonsense correlations. 
Another example was pointed out by the Norwegian statistician L. V. Charlier, 
who found a correlation of 0.86 between the size of the stork population in 
Oslo over a period of about 40 yeais and the number of babies born there 
each year. 

When there is a real correlation between two variables, it may be that a 
change in the one variable is the cause of a change in the other. If so, the 
former would be taken as x and the latter as and we should be interested 
in the regression of y on x. Sometimes, however, it is not clear which variable 
corresponds to cause and which to effect. In economics, an increase in price 
may stimulate an increase in production, but also an increase in production 
may be the cause of a lowering of price; the nature of the particular commodity 
and its market must be considered. 

16.6 Variation Around the Regression Line. The average concentration 
of the points in the scatter diagram around the regression line of y on x may 
be measured by the expression Clld^)/Nj where d» is the difference between 

i 

an observed and the corresponding Yi calculated from the regression line. 
This expression may be regarded as the variance of the y values in the sample 
around the regression line, and we shall denote it by which is usually 
called the variance of estimate. From the definition of d*, 

= Z^i - Yi)* 

“ - 5 - bixi - s)]* 

“ Z(yi - - *)’ - 26 i:(x, - - p) 

- Nsy* + ms.* - 7bN8,y 
Now, by (16.9) and (15.17), 

(16.26) h*8* - hs,y - i»s* 
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so dxat 

(16.26) = (1 - 

Since s,,* cannot be negative, it is clear that, as stated previously, r’ cannoi 
be greater than 1, or, in other words, —1 < r < 1. 

If the deviations d, of the dots from the regression lino are expressed in 
standard units, 

= JVs.,Vs/ 

= N{1 - r*) 
or 

(15.27) = 1 ~ Y.d,yN 

W© can consider 1 — ^d^/N as a measure of the goodness of fit of the 
regression line (15.23) to the points of the scatter diagram, expressed in 
standard units, so that the greater the numerical vfilne (^f r the better the fit. 

Since F* is the estimated value given by the regi’ossion line for a given 
X*, that is. 

Y i = a hx^ 

we have ]^F, = iVa + 

by the first equation of (15.2). This implies, on dividing by N, that T — y, 
80 that the mean of the F^ is the same as the mean of the y,. Wo now prove 
the following theorem . 

Theorem 2. The variance of the F* is ri times the variance of the y^. 

Proof : 

Nsr^ = Zci". - yy = Lo''. - vy 

= - x)\ by (15.9) mid (15.10) 

= N¥sJ‘ 

= JVr»s»*, by (15.25), 

so that 

(15.28) ri - syVsJ^ 

This means that ri is the ratio of the variance of the computed F < to the 
variance of the observed y^. The variance of the F* is sometimes called the 
explained variance — it is that part of the total variance which is accounted 
for or explained by the regression of y on x. From (15.26), 

(15.29) = 8y^ - Sr* 

and so is equal to the residual or unexplained variance. 
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Theorem 3. The correlation between the y* and the F, ie the same as that 
between the and the 2 /». 

Proof: 

5 ) 

— > since Y — y 

Theorem 4. 53^ ~ “ b^xy 

Proof: 

= ^{y — a — bx){y — a — bx) 

= ^y{y — a — bx) — a^{y — a — bx) — bY!,^{y - a — bx) 

But, by (14,8) and (14.9), 

^(y — a — bx) =0, X)x( 2 / — a — 6x) = 0 

so that 

= Jlyiy - a-bx) 

= Hy^ - (^Yhy ~ 

This theorem provides a check on the calculation of YhP' ^ usually 
necessary, however, to know a and b accurately to several significant figures, 
because the factors ^y and ^xy may be relatively large while the right-hand 
side as a whole is quite small. 

All the results in this section may be applied, if desired, to the regression 
line of x on y. It is merely necessary to interchange x and y and write 6' for b. 

If the foregoing theorems are intended to apply to a population of pairs of 
values (x, y)j from which the observed N pairs form a random sample, a 
slight modification is necessary. Just as the sample variance is not an 
unbiased estimate of the population variance cry*, so the variance of estimate 
is not an unbiased estimate of the population variance of estimate 
It timns out that Ne^/ (iV — 1) and Nsey/ {N — 2) are unbiased estimates 
of <Ty* and (Tey*, respectively. If we denote these estimates by 5y* and 
we have, instead of (16.26), the relation 

{N - 2)?.y* « (1 ^ r*)(JV ^ l)&y* 



(16*30) 
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li.7 Significance of liie Regression Coefficients* The regression coeffi- 
cient bf which is the slope of the trend line of y on Xy is usually determined 
from a sample of N pairs of values of x and y. The sample regression is not 
usually of great interest in itself, but it enables us to form an estimate of the 
true regression coefficient in the population from which the sample is taken. 
If we assume (1) that the true regression is linear, given hy rj ^ a + fix; 
(2) that each yi in the population is a value of a random normal variate, 
independent of the other ^^s; (3) that the mean of the y^ in the population 
for a given is the corresponding value of ij* = a + fix^; and (4) that the 
variance of the y, is the same for all values of x, namely, Oey^; then we can 
prove that b is normally distributed about fi with variance crey^/Nsx^. Since, 
however, we do not know a^y^y but only the estimate Oey^y we substitute this 
for and can then show that N^^h^ib — fi)/d-,y has Student's t distribution 
with N — 2 degrees of freedom. By (15.30) 


so that 
(15.31) 


= mhy{i - 7^y^y{N - 2 )*'» 




ib 




In order to determine whether an observed value of b differs significantly 
from zero, we put fi = 0 in (15.31) and calculate 

/N — 2\^ 

(15.32) 


If this is greater than the value ta corresponding to an assigned significance 
level and the given N ~ 2, we can say that the observed b is significant. If 
we merely want to know whether there is any correlation in the population, 
regardless of the sign, we make a two-tailed test and double the probabilities 
given at the head of Table II in the Appendix. 

Alternatively, we can use (15.31) to establish confidence limits for fi. 
Thus if fo -06 is the value of for iV — 2 degrees of freedom, such that the 
probability of a numerically greater deviation is 0.05, then 

and the 96% confidence limits for /3 are given by 

b /I — 

(15.33) jS = 6 ± ^ <0.05 (^ _ -g) 



266 


Linear Regression and Correlation 


XV 


Example 4. For a sample of 10 the observed values of h and r are 0.103 and 0.5$2« Does 
the value of b differ significantly from zero? 

For 8 degrees of freedom, <o.o» “ 2.306. Also6/r »= 0.280, (1 — — 2)^'* 0.288, 

so that fi * 0.163 ^ 0.186. The 95% confidence limits are —0.023 and 0.349, and as this 
interval includes zero, the value of b is not significant. 


The most dubious of the assumptions we have to make in getting the dis- 
tribution of b is probably that of the constancy of variance of y for all values 
of a:. A distribution of x and y values satisfying this condition is said to be 
homoscedastic (from Greek words meaning ^^equal scatterings^. If the condi- 
tion is clearly not satisfied, it is sometimes possible to transform the y variate 
(for example, by taking logarithms of 2/ ) so as to render it more nearly 
homoscedastic. 


16.8 Significance of the Correlation Coefficient. We denote the true 
coefficient of correlation in the population by p (rho). If the regression is 
linear and p - 0 we must have — 0, so that, as mentioned in the preceding 


section, the quantity r 



has 


Student^s t distribution, provided y 


satisfies the various conditions laid down. 


/jvr - 2\^ 

Example 5. From the data of Example 4, r ( ^ j = 0.582/0.288 2.02, which is 

below the 5% significance level. The observed r is therefore not significantly different from 
zero. This illustrates the fact that a coefficient of correlation calculated from a sample as 
small as 10 is often liighiy unreliable. 


When N is large* the distribution of the sample coefficient of correlation r 
(if the true value p is zero) is nearly nonnal with mean zero and variance 
1/{N - 1). 

If p is not zero, the distribution of r is skew and mathematically complicated, 
even when the conditions laid down in §15.7 are satisfied. It was shown by 
R. A. Fisher that if we write 


(15.34) z' ^ i loge [(1 + r)/(l - r)] 

then z' is approximately a normal variate with mean f i log* [(1 + p)/ 
(1 — p)] and variance 1/(N — 3). The relation (15.34) can also be written 
r « tanh z' (hyperbolic tangent of z') and a table of this function is given in the 
Appendix (Table VI) . With the help of this table we can easily find confidence 
limits for p from a given value of r. 

Example 6. In a class of 25 students we find a correlation coefficient of 0.731 between the 
scores on two tests. Establish 95% confidence limits for the value of the correlation coeffi- 
cient in the population. 

Corresponding to r ** 0.731, we find from the table (reading from the inside out to the 
margin) that z' »» 0.931. The standard deviation of z' is 1/(22)^ » 0.213, so that, taking 
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the 5% point for the normal law as 1.96, the 95% confidence limits for t (where p tanh f) 
are given by r * 0.931 ± 1.96(0.213) « 0.513 and 1.349. 

Consulting the table again, we see that these values correspond to p ■■ 0.472 and 0.874, 
respectively, which are the required confidence limits, (r is the Greek letter seta.) 

F. N. David (Reference 4) has constructed diagrams, one of which is repro- 
duced as Chart II in the Appendix, giving confidence limits for p corresponding 
to different values of r, for various sample sizes from 3 to 400. To use the 
diagram for the data of Example 6, we simply follow up the ordinate at the 
point r == 0.731 on the horizontal scale until we cross the two curves marked 
25, one in the lower half and one in the upper half of the diagram. The corre- 
sponding values of p (on the vertical scale) give the confidence limits. 

16,9 Accuracy of Estimate from Regression. The quantity given by 
(15.26) is called the standard error of estimate because (for large samples) it 
is a measure of the error to be expected in estimating y for a given value of x, 
by means of the computed value Y, When r = 0, (15.21) becomes Y ^ y 
which njeans that the tiest estimate of y foi any value of x is the mean of the 
^/-distribution. In other words, knowledge of x is of no value in predicting y. 
When r = 0 in (15.26), Sey = Sy. This is to be expected since the dispersion 
8ey about the line y — y is the same as the dispersion $y of the given about 
their mean. 



In Fig. 66 parallel lines are drawn at a vertical distance of s^y on either side 
of the regression line RR'. This strip will enclose about two-thirds of the 
whole distribution (assumed normal). The strip between the lines on either 
side of y « 5, at a distance Sy from it, encloses about two-thirds of the dis- 
tribution when r « 0. As j r | increases from 0, the line RR' rotates from 
the horizontal toward the final position it would have for [ r | « 1, and at 
the same time decreases from Sy to 0. 

As I r I increases, k decreases, where k » s^y/sy (1 — The improve- 

ment in the estimation of y from a knowledge of the regression may be meas- 
ured by fc' 1 — fc. Table 58 gives values of k and fc' for given values of r. 
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We see from this table that when r = 0.5, for example, k' * 0.134. This 
means that s,y has been reduced 13.4% from the value «», in virtue of the corre- 
lation that exists, so that the standard error of our estimate is reduced by that 
much from what it would be if we tosed simply the estimate y. It is clear 
that a high correlation is necessary in order to make a substantial reduction 
in tibe error of estimate. 


Tabus 58 , Values of r and the Corresponding Values of k and fc' 


r 

k 

k' 

0.1 

0.995 

0.005 

.2 

.980 

.020 

,3 

.954 

.046 

.4 

.917 

.083 

.6 

.866 

.134 

.6 

.800 

.200 

.7 

.714 

.286 

.8 

.600 

.400 

.9 

.436 

.564 

.92 

.392 

.608 

.94 

.341 

.659 

.96 

.280 

.720 

.98 

.198 

.811 

1.00 

0.000 

1.000 


With small samples, it may be shown that the variance of y about the 
regression line, for any given x, is not constant, but is a function of x. It is 
given by 

(15.35) (, _ F) - [l + ^ + 

and in this formula must be replaced by its estimate Cgy^ = Nsey^/{N — 2). 
The strip around the regression line is wider as the distance of x from f in- 
creases. However, the variation in width is not very pronoimced for moder- 
ately large values of JV, and for x not too far from x. 

16.10 Calculation of r for Grouped Variates. When the sample to be 
studied is large, it is more convenient to replace the scatter diagram by a 
correlation table. We may divide the xy-plane into rectangles of convenient 
size, and all points of the scatter diagram falling within any rectangle are 
thought of as being concentrated at the center of this rectangle. A number 
is then written within the rectangle to designate the number of points at its 
center. A correlation table is therefore a two-way frequency table exhibiting 
the frequencies in each class interval. 
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Suppose Table 59 is constructed in this way for a set of average daily grades 
(x) and final examination grades (y) of 100 students. When the data have 
been thus grouped into classes, the class marks are regarded as the variate 
values. Thus in Table 59 there are 9 students whose daily grades are 87 and 


Table 59 



■ 

65 - 

69 

70- 

74 


80- 

84 

85- 

89 

90- 

94 

95- 

99 

■ 


Bi 

67 

72 

77 

82 


92 



90-94 

92 




1 

2 

3 

1 

Bi 

85-89 




1 

3 

8 

1 

5 

18 

00 

1 

o 

00 


l■D 

4 

6 

4 

9 

1 


28 

75-79 

77 

IB 


im 

6 

4 



23 

70-74 

72 

2 


B 

6 

1 

1 


18 

65-69 

Bi 

3 

2 






5 

60-64 

62 

1 







BI 



13 

12 

19 

20 

24 

6 

' 6 

100 1 


whose final examination grades are 82. The last column labeled fy represents 
the distribution of y variates and tlie last row labeled /* represents the dis- 
tribution of X variates. A correlation table is thus a bivariate distribution. 
In Table 59 the width of the class interval is the same for x and z/, but of 
course this is not generally the case. 

The rectangles containing the Irequencies are called cells. The frequency 
in a typical cell is denoted by fxy, meaning the frequency in the cell whoso 
coordinates are x and where x and y are the mid-values of the class intervals. 
Both columns and rows are subdistributions of the total frequency N. Each 
column is a frequency distribution of z/’s corresponding to a mid-x value. 
Similarly, each row is a frequency distribution corresponding to a mid-z/ 
value. The sum along any row is denoted by ^fxyj being the sura of the 

X 

frequencies in the (x, y) cells in the x-direction. Since the marginal total 
for any row is the total frequency corresponding to a given value of y, it is 
therefore written in the column headed fy. Thus, in Table 59, for y = 92, 

Y^f^ = 1 + 2 + 3 + 1 = 7 

X 

Similarly, denotes a summation in the j/-direction of all the entries in 

ft columni corresponding to a fixed value of x^ so it denotes an entry in the 
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bottom row which contains the /* frequencies. Thus, for a: = 67 
£/*» = 4 + 3 + 2 + 3 + l'= 13 

y 

The total frequency AT may be obtained by adding either the marginal sub- 
totals/*, the marginal sub-totals /y, or all the cell frequencies fxy. That is, 

(15.36) N = SA = E/x = Z/x, 

y X xy 


The marginal sub-totals do not determine the coll froquencies uniquely 
example, we might replace the four cell frequencies in the upper 
right-hand corner of Table 59 by the cell frequencies shown alongside 
without disturbing the sub-totals. 

The mean of all the for such a table is given by 


For 


(15.37) 


2 = 4 = 4 La-E/xV 

iV xy X y 


N 


Ha;/, 


because in summing for ^ we keep x constant. The mean x is the same, 
therefore, as the mean of the marginal distribution of x, and similarly 


(15.38) ^ 

Any column of the table is an x-array of ^^s, so we shall use the symbol gf* 
for the mean of a column. Similarly Xy will denote the mean of a y-array of 
x^s, that is, of a row. 


Theorem 6. The mean y is the arithmetic mean of the column means 

1 v-' 

weighted with the column frequencies /*, that is, ^ 

iV * 


Proof: By definition, yx = y Hiz/xy, so that 4 H/x5x = 4 Hiz/x* = 

Jx y Jy X jy xy 

The variances of x and y are given by 


(16.39) 


«** - ^ - 2* 
«»* = 4 Sy’A - ^ 


Just as in the case of a grouped one-way frequency distribution, it was 
found convenient to choose an arbitrary origin and take the class interval as 
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unit, 80 we now do likewise with both variates. Let 

(15.40) M = (x — Xo)/h, V = (y — y^)/k 
where h is the class-interval for x and k that for y. Then, as before, 

(16.41) i = Ch 4 - a;o, y = vk + yt 
where 

u = T.UU/N, V = '£,vfJN 

Here /u is the same as /«, being the column frequency for a given x (or it). 
Changing the unit of x does not affect the frequencies. Similarly is the 
same as/y. Furthermore, as we found in Chapter VI, 

(16.42) 5*2 = hW, « kW 

In computing r we need also the covariance of x and y, namely, 

(15.43) Hfxvix -x){y- y) 

iV xy 

and, from* (15.40) and (15.41), 

X — X = h{u — w), y — y ^ k{v -- v) 

so that 

hk 

(15.44) 5 *y = *— — U){v — v) ^ hk Suv 

N uv 

Then 

(15.45) T ~ 8xy/ (Sa-Sy) = Suv/ (Si*5p) 

80 that in computing r we can work throughout in the new and simpler variates 
u and V. For the calculation of and we need to form the sums 

and a computation scheme for these is set out in Table 
60 . Additional columns and rows for Charlier checks are desirable, but have 
been omitted from the printed table for the sake of clearness. The only 
new point is the computation of 5„®. 


Now 

8». = -k S/..UV - Uv 

N uv 

and 

WV U 9 U 

adisre 


(15.46) 

7- S*/- 
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Table 60 . Compotation of r fob Data of Tabuo 69 



u 

-3 

-2 


0 

B 


3 


1 

z 

fv 

u 

vU 


B 

67 

72 

77 

82 

87 

92 

97 


92 

B 

B 

B 

B 

2 1 

3 

1 

7 

21 

63 

XI 

33 


B 

B 

! 

1 

3 

8 

1 

5 

18 

36 

72 

24 

48 


B 

B 

B 


B 

9 

B 


28 

28 

28 

-15 

-15 

1 0 

77 

3 

3 

7 

6 

B 

B 


23 

B 

B 

-18 

0 

B 

72 

B 

B 

5 

6 

1 

B 


B 

-18 

18 

-14 

14 


67 

B 

B 



B 

B 


5 

-10 

20 

-13 

26 



B 

B 


B 

B 

B 


1 

-3 

9 

-3 

9 


B 

12 

19 

20 

24 

6 

6 


54 

210 



HH 

B 

-24 

-19 

0 

24 

12 

13 

B 


H 

B 

48 

19 


B 

24 

B 

on 


B 

-3 

3 

fl 

IQH 


B 

54 



6 

-3 

0 

30 


B 



SO that 
(16.47) 


= ^uV — Uv 
JM u 


Also 

Yfyx,uv = Y^'Hfuvu = Y^u 

where 

(15!48) 

uv V U V 

U = Em/u. 

BO that 



(15.49) = Y.vU 

U V 

The equality of these two expressions serves as a check on the calculationa 
Two other checks are provided by the relations 

( 16 . 60 ) Yu = Yufu. = Em/- 

V UV M 

(15-51) = E*/«* “ E*/- 
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The U entriee in the table are found by multiplying each cell frequency by 
tile corresponding u and adding along the row. Thus for the first row, 
V — 0(1) + 1(2) + 2(3) 4- 3(1) = 11. The separate products may be 
placed in the upper right-hand corners of the cells. Similarly, the V entries 
are found by multiplying each cell frequency by the corresponding v and 
adding up the column. For the first column, Y = -3(1) — 2(3) - 1(2) -|- 
0(3) + 1(4) — 7. The products may be placed in ^e lower left-hand 

comers. 

From the table we have 

««» - 286/100 - (-28/100)* - 2.782 

«.* - 210/100 - (54/100)* - 1.808 

s», - 116/100 - (-28/100) (54/100) - 1.301 

. 1-301 ^ OOl __ 

" (2.782 X 1.808)^ ” 2.243 “ ' ® 

Note that the data have been arranged so that the directions of increasing x 
and increasing y are the conventional ones along the axes Ox, Qy, A positive 
value of r then corresponds to a linear trend tine of positive slope. 

The general effect of errors of measurement in x and y is to decrease the 
coefficient of correlation, because of the increased scattering of the observa- 
tions. This effect is known as attenuation. The effect of grouping is also 
to reduce the coefficient, because on the whole the errors caused by grouping 
cancel in Suv but tend to increase sj' and unless the latter are corrected by 
applying Sheppard^s correction. The number of cells should be large, prefer- 
ably 10 or 12 each way, in order to reduce grouping errors. 

Commercial charts. Computations can be expedited by the use of com- 
mercially prepared correlation charts. Several types of chart are available 
on the market. In her book (Reference 17, §0.4), Professor Helen M. Walker 
explains the merits of two of these whiclr are recommended. She also gives 
the following advice to beginners: chart is not a crutch to help the novice. 

It is a means of speeding up operations after they are well understood.'^ 

16.11 Regressdon Lines for a Correlation Table. Suppose for each column 
of a correlation table we compute the mean and place a small circle at this 
value of y in the middle of the column. (The easiest way to do tnis is to 
plot t;„ « ^vfuv/fu = V/fu on the vertical v scale.) If we now fit by least 

f 

squares a straight line to these column means, each weighted with its own 
column frequency y this line turns out to be the same as the ordinary regression 
line of 1 / on X fitted to the scatter diagram (all the dots in a cell being taken 
as lying at the center of the cell). The least squares condition is 

(15.62) 5ix) 2 = min 
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By partially differentiating this with respect to Oi and h and setting the 
results equal to zero, or (without the calculus) by treating the left-hand side 
as a quadratic expression in both ai and bi and using the method of §14.9, we 
obtain the simultaneous equations: 


(15.53) 

(15.64) 

Since 

and 


* 1 " bi ^ ] xf X “ X 

ai^xfx + bi^x^ft = Y^xy4, 
= Hvfxv = Ny 

X xy 


these equations become 


(15.55) 

(15.56) 
whence 

(15.57) 


fli + bix = y 
NaiZ + bij^x-f^ = 

X xy 

^ Y.^yfxy - Nxy ^ Tjy 

' - Nx'^ s* 

ai = y — bi2 


so that the fitted line is 


T 8 

(15.58) y - 5 = — (x - s) 

Sx 

which is the regression line of y on x. 


Table 61 



X 

22.5 




m 




1 

Xy 

y 

m 

a 

-3 

~2 

1 

B 

1 

2 


m 

D 





2 

3 

2 



47.5 

nci 

3 



1 

3 

1 

4 

4 

4 

lEI 

BED 

cs 

2 



5 

7 

8 

11 

8 

7 

iia 

ESS 

m 

m 


wm 

1 

BI 

12 

9 

8 

2 

lO 

Kf«W 


m 

1. 

Kl 

■a 

wm 

7 

■a 

B 

1 

\m 

ca 

m 

-1 

wm 

mm 

wm 

6 

wm 

8 

B 


43 

41.6.J 


ran 

m 

5 

5 

■1 

8 

wm 

B 



as 

la 



wm 

3 

4 

■1 

mw 





fx 


14 

32 

49 

55 

54 

35 

14 


■ 

Bi 





85.7 

90.9 

95.6 

105.0 


■ 


Tef^t Grades and Productwe AhUity 
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Example 7, A personnel manager in charge of hiring employees of a manufacturing 
plant has instituted a system of mental tests for applicants and has gathered the data shown 
in Table 61, where x represents the grade made on the tests and y the production ability 
of the applicants after they have been hired (measured as percentage of a certain standard 
of production). 

In order to demonstrate to the company’s management the connection between his 
mental tests and the productivity of the employees he has hired, the personnel manager 
does the following: (1) computes the coeflicient of correlation between the two series; 
(2) shows what the estimated productivity of employees would be whose grades in the 
mental test fell on the mid-points of the class intervals of tlie mental test data. 

The means of the columns and of the rows are given in the table. In addition, he obtains 
the following results: 

5 = 42.17, » 17.41, r « 0.417, 

S « 87.31, s, » 8.40, 6«r~« 0.864. 

Therefore, the line of regression of y on x is 


or 


y. - 87.31 - 0.864 (x - 42 17) 
lx = 0.864X f 50.88 



5 10 iS 20 25 30 35 UO 45 50 55 60 X 

Fia. 67. Means of Columns and Line of Regbession 
OF y ON X for Table 61 

This is the equation of the line that best fits the points which designate the means of the 
columns (Pig. 67). Hence, for an assigned value of x, this equation gives the value of y 
which is the expected mean of the column defined by the assigned value of x. The personnel 
naanager is thus prepared to predict the productivity of applicants on the basis of their 
mental test grades. In other words, the regression equation calculated from the records of 
those already hired may be used in selecting from future applicants those most likely 
to succeed. 
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The low value of r( = 0.417) euggeets that such prediction is not very reliable. The 96% 
limits for p are about 0.31 and 0.51, but even if we accept the observed r, the improvement 
in prediction due to regression is only about 9% (see Table 58, §15.9). 

16,12 Variance of Estimate for a Correlation Table. We can define the 
variance of the dots in a column about the regression value F* for the center 
of that column by 

( 15 . 59 ) - r,)V/. 

y 

For example, in Table 61, for the column u ■= — 2, 

Y, - 0.864(32.5) + 50.88 = 78.96, /, = 32 

and 

Sv..* - [1(115 - 78.96)» + 6(105 - 78.96)» + • • • 

+ 3(55 - 78.96)*]/32 - 254.7 
We now prove the following theorem : 


Theorem 6. The arithmetic mean of the each weighted with its own 
column frequency fn, is equal to s,/ = s„*(l — r*). 

Proof: Ftom (15.69) 


iV 2-^*®''-** 


A xy 



and, as in §15.6, this reduces to s/(l ~ r^). 

16.13 Normal Correlation Surface. A correlation table may be idealized 
into a surface in somewhat the same way that a histogram is idealized into a 
frequency curve. The concept of a surface relates to the universe from 
which the observed data of the table may be regarded as a sample. Let the 
dimensions of the cells of a table be Ax and Ay^ and suppose columns are 
erected upon these cells with altitudes proportional to the frequencies in the 
cells. The result is a sort of solid histogram called a stereogram. Then as 
Ax — > 0, Ai/ 0, A 00 , the tops of the columns approach as a limit a smooth 
surface which is called a correlation surface. Our discussion will be confined 
to the case where w^e may assume that this limit is a normal correlation 
surface. In discussing this surface it is convenient to let x and y represent 
deviations from the respective means and to let z — fix, y) denote the fre- 
quency function representing the surface. Such a surface is shown in Fig. 68. 

Any section of this surface parallel to the yz-plane is a nonnal curve and 
represents the distribution in a column at x. Similarly any section parallel 
to the xz-plane representing a row is a normal curve. The frequency in a 
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cell is measured by that portion of the volume under the surface ivhich lies 
over that cell AH those cells in which the frequency is a fixed value lie on 
an ellipse. That is, if contour lines are drawn on the surface joining the 
points of equal height above the base they will be ellipses. In other words^ 
sections of the surface parallel to the j^Z-plane are ellipses. 



Fia. 68 


We will digress here for a brief discussion of an ellipse. We may think of 
an ellipse as a transitional figure between a circle and a straight line, as the 
circle flattens out. That is to say, the limiting form of an ellipse is a circle 
at one extreme of the flattening process and a straight line segment at the 
other extreme. The degree of flatness is called the eccentricity of the ellipse, 
and it is proved in analytic geometry that the eccentricity varies from sero 
in the case of a circle to unity when the ellipse degenerates into a line. All 
ellipses having the same eccentricity whatever their size have the same rela- 
tive proportions and are therefore similar in form. 

The eccentricity of the elliptical contours of different normal correlation 
surfaces varies with the amount of cor- 
relation existing in the corresponding 
universe. A surface with narrow ellip- 
tical contouis represents a universe in 
which there is high correlation, whereas 
if the variables are completely independ- 
ent in the probability sense the contour 
lines are circles when the variables are 
expressed in standard units and when 
the same scale is used for both axes. 



Fig. 69 
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If the variables are not expressed in standard units (and p « 0) then the 
contour lines may be ellipses, but their major and minor axes will coincide 
with the X- and s/-axes as in Figure 69. When p 9 ^ 0 the axes of the ellipses 
make an angle with the xy-axes, their major axis cuts quadrants I and III 
in the xy-plane if p > 0 (as in Fig. 68) and quadrants II and IV if p < 0. 

The equation of a normal correlation surface is given by 

(16.60) fix, y) - 
where 

p = 1 ^ 

2(1 — p2) 

X » 1 -5- — p2), and x and y represent the correlated variables 

referred to their respective means as origin. 

By means of (15.60) an observed distiibution may l)e fitted with the appro- 
priate normal surface assuming that the sample might reasonably have come 
from such a universe. This is accomplished by replacing and p in 

(15.60) by the corresponding statistics calculated from the sample, multiply- 
ing by Ny and taking the origin at the mean of the table. Let us assume that 

an observed distribution has been gradu- 
ated by such a surface and the theoretical 
cell frequencies obtained. The surface 
extends to infinity in the x2/-plane, but 
contour ellipses can be obtained which will 
enclose any desired percentage of the 
given frequency when these ellipses are 
projected orthogonally onto the xy-plane. 
They are all concentric, similar, and simi- 
larly placed. Fig. 70 represents such an 
ellipse, say the smallest one necessary to 
enclose all the given cells. The systems 
jtiq, 70 of perpendicular chords represent the 

colunans and rows of the table. 

The graduated frequencies for each column are normal distributions whose 
means lie on the true regression line of y on x and whose standard deviations 
are in each case given by = cry{l ~ p^)'^^ To state the same thing in 
a slightly different way, an array of y's corresponding to a fixed value Xi of x 
is a normal distribution whose mean deviates from zero by p(<ry/<r*)xi and 
whose standard deviation is (Xey = cryil — p^Y^^ which is independent of Xi 
and therefore is the same for all such arrays. Similarly an array of x^s corre- 
sponding to a particular value yi of y is a normal distribution with a mean 
which deviates fiorn zero by p(<ra:/<ry)yi, and a standard deviation of ir,, = 
ffxil — p^y^^ which is independent of yi and therefore is the same for all such 
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arrays. A careful study of Fig. 68 will help in understanding what is meant 
by these statements. Cf. the four assumptions in §15.7. 

For the normal correlation surface the column and row means fall exactly 
on the regression lines, so that the variance about the regression line is the 
same as the column variance, This is a special case of Theorem 6, in 
which all the quantities being averaged are equal to each other. The distribu- 
tion is homoscedastic (see §15.7). 

The probability of a deviation of an observed y (for a given x) from the 
predicted value i; (given by the true regression line) is found from the table 
of the normal law by expressing the deviation in standard units. If 


(15.61) 



flry(l - p2)^ 


the probability of a deviation as large numerically as z is given by 


(15.62) 


P 



dz 


In practice, is replaced by 7, and by = Ns^y^KN — 2). The di»- 
tribution of z is not precisely normal, but for large N may be taken as normal 
with little error. For the data of Example 7, Sey « 15.8, so that, assuming 
that the population is adequately represented by a normal bivariate surface, 
there is a probability of about 0.68 that the predicted value oi y ^ 79.0 for 
X =» 32.5 will not be in error by more than 16 either w^ay. 

16.14 Best-fitting Straight Line When Both Variates Are Subject to Error. 
If the purpose of the regression line is to express the relationship between 
X and y (and not to 'predict one variate, given the other), then when y alone is 
subject to error we should use the regression line of y on x, and when x alone 
is subject to error we should use the regression line of x on y. When both 
variates are subject to error, neither of these lines may give the best fit. 

Karl Pearson suggested the fitting of a straight line such that the sum of 
squares of deviations perpendicular to the line is a minimum. This may be 
satisfactory when we can use the same scale for both the variables concerned 
(as when correlating the stature of fathers and sons), but since in general 
the line depends on the choice of units for x and y it has no fundamental 
significance. Other methods require a prior knowledge of the standard 
deviations of the errors, and this knowledge is often not available. A. Wald 
(Reference 5) has suggested a simple method of fitting, in which the standard 
deviations of these errors can be estimated from the observations. 

The N variates x,*, are supposed to be of the form 

(15.63) + yi-rii + ei, i=l, 

where the true values ly,* are connected by the linear relation 

(15.64) fii ^ a + 
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and the di and 6% are random variates. The di are unoorrelated and have a 
common vaiiance the e» are uncorrelated and have a common variance 
and the d< are uncorrelated with the Cg. 

For convenience we suppose that N is even (= 2m). The observations 
are divided into two equal groups, i from 1 to m, and from m + 1 to AT, after 
being arranged in the order 

< • * • < xat 


If the arithmetic mean of the first group is {xi, yi) and that of the second 
group (ft, f/i), the estimates of /3 and a are 

(15-65) 4 = (5» - h)/ixt - fi) 

(15.66) S = 5 - 

where 

5 - (ft + ft)/2, y^Wi + h)/2 

and, furthermore, if are the sample variances and the saniple 

covariance for x and y, estimates of <r/ and (r;^ are 


(15.67) « [8,^ - 9.y/$]N/{N ~ 1) 
and 

(15.68) 5.* « [sy^ ^ $8,y]N/(N - 1) 


All these estimates are consistentf that is to say, the probability that they 
differ from the true values by less than any given amount, however small, 
tends to one as ► w , 

The ordering of the observations by the size of the is not quite inde- 
pendent of the errors, unless these errors are small compared with the average 
interval between the x*-, but in practice this arrangement is usually satisfactory. 


Example 8. Applying Wald’s method to the data of Example 2, §16.1, we have 
ft « (8 -h 12 4- 12 + 16 + 15 •+• 18)/6 « 13.3 

ft « (19 4- 22 + 26 -f 26 4- 29 + 38) /6 « 26.7 

ft - (34 4- 38 4- 37 4- 62 4- 46 4- 37) /6 « 40.7 

ft « (22 + 14 4- 37 4“ 22 4- 20 4“ 26)/6 « 23.3 

00 that 


and the line of best fit is 




17.4 _ 
““ 13,4 " 
32 - 20/S 


-1.30 


-68.0 


Y » 68.0 - 1.30X 


This line is roughly midway between the two regression lines in Fig. 61 . 


16* 16 Which Regression Should Be Used for Prediction? It does not follow 
that the equation which best expresses the relationship between x and y is 
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the best to use for predicting one variable, given a value of the other. If both 
X and ^ are subject to errors which are normally distributed, and if the true 
value f of a; is also a landom nonnal variate, then the best predicting equation 
for p is the ordinary regression line of p on x, even though this line is not the 
best expression of the relation between x and p. 

There are different situations which may arise in practice. Sometimes the 
X and p values are measured on a random sample fron^ a population and then 
the dots in the scatter diagram are scattered in both the x and the p directions. 
If so, we can safely use the regression of ^ on t for predicting y for a value of 
a: to be observed in the future, and the regreasion oi x on y for predicting x 
for a future value of whether or not the variables are subject to error. 

Often, however, the x values are not random but selected values, and the 
p values are the only ones that can be regarded as random. We can still 
use the regression of y on a: for estimating y^ given x, but what are we to do 
about estimating x, given This is a situation which arises frequently in 
biological experiments. For example, in assay work it may be necessary to 
estimate the potency of some drug preparation in terms of biological response, 
with the aid of a response curvt based on known doses of a standard drug. 
The only possible regression is that ot response on dosage. The other has 
practically no moaning. 

Dealing for convenience with a large sample, we can suppose that for any 
assigned x the ^/’s are normally distributed with mean a + hx and variance 
SeJ^. Then the probability that y lies within the limits a + bx — 1.96 
and a + bx + 1.96 is 0.95. Of all possible pairs of (x, y) values, 95% 
will lie within the strip shown in 
Fig. 71, so that if, for a given Pj we 
assert that the corresponding x lies 
in the strip — in other words, that 
X lies between (?/ — a — l,96s«i,)/6 
and (y -- a + 1.96 Sey)/b ~ we 
stand a 95% chance of being right. 

In the long run, if we make many 
such assertions for all possible val- 
ues of X and py 95% of these asser- 
tions will be true, and these limits 
are therefore the 95% confidence 
limits for x. It should be under- 
stood, however, that x is not a 
random variable, and for a partic- 
ular y the statement about x is either true or false. 

A discussion by C. P. Winsor of this problem will be found in Reference 
6, and moie details on the modifications required when the sample is small 
are given by C. Eisenhart (Reference 7). 
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Exercises . 

1. Prove that (16.8) is equivalent to (16.6). 

2. Write out the details of fitting the regression line of x on y by least squares, and 
obtaining equation (16.12). 

3. The following data represent the ages of husband (x) and wife (y) for twenty couples. 
Calculate both regression lines and plot them on a dot diagram. 


X 

22 

24 

_ 1 

26 

26 

27 

27 

28 

28 

29 

30 

30 

30 

31 

32 

33 

34 

35 

35 

36 

37 

y 

18 


20 

24 

22 

24 

27 

24 

21 

25 

29 

32 

27 

27 


27 

30 

31 

30 

32 


Am, Y = 0.89X - 0.57. 

X « 0.82y -f 8.6. 

4. Work out the estimated values Y for the given x of Exercise 3. Form the differences di 
and compute and 21 d**. 

6. Find the value of r for the data of Exercise 3. Am, r ** 0.866. 

6. Compute Sey* by (15.26) for the data of Exercise 3, and compare with the value 
(2]d,*)/20 obtained from Exercise 4. 

7. In studying a set of pairs of related variates, a statistician has completed the pre- 
liminary arithmetic and obtained the following results: 

N » 100; “ 1,585,000; = 12,500; = 1,007,425; = 648,100; - 

8,000. Find 5, Sx, r. 

8. Given the following results for the heights and weights of 1000 men students: 
f? « 68.00 in., 35 = 150.00 lb, r « 0.60, 

wt 2.60 in., 8x *= 20.00 lb 

John Doe weighs 200 lb, Richard Roe is five feet tall. 

Estimate the height of Doe from his weight, and the weight of Roe from his height. 

Am, Doe's height « 71.76 in. 

Roe’s weight « 111,6 lb 

9. (a) Given the following: 

23® = 150,000, “ 22,725,000, 'Exy =■ 10,522, 600, 

'Ey - 70,000, “ 4,936,000, N = 1000. 

Find f , 5, Szt Sy, r, and the lines of regression. 

(b) Suppose the data in (a) refer to the weight in pounds (x) and the height in inches (y) 
of a sample of 1000 policemen. Suppose Paul Private weighs 160 lbs and Saul Sergeant is 
6 ft tall. Estimate the height of Private and the weight of Sergeant. 

10. The following table contains the grades made on two tests by 25 students in mathe- 
matics. Find 2, y, Sxt Sy^ r for these data. 


X 

88 

96 

68 

73 

75 

88 

57 

68 

62 

79 

73 

74 

78 


80 

67 

66 

69 

74 

78 

72 

59 

47 

56 

67 

43 


V 

82 

86 

75 

78 

72 

79 

63 

66 

67 

75 

68 

70 

79 


78 

61 

58 

65 

69 

68 

83 

80 

42 

43 

48 

47 



Am, 69.8, 67.6, 12.1, 12.7, 0.79 
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11* In the following anthropometric measurements on a random sample of twenty 
male freshmen, taken from the Physical Education Department, x represents height, y repre- 


X 

y 

2 

X 

y 

s 

68.6 

33.6 

148 

65.3 

33.0 

136 

67.2 

35.0 

144 

66.1 

34.0 

144 

67.7 

30.2 

145 

64.8 

37.3 

170 

63.8 

30.0 

108 

69.6 

33.4 

164 

69.9 

33.0 

130 

68.2 

31.5 

122 

64.7 

31.0 

112 

68.8 

32.0 

141 

68.4 

33.0 

134 

72.3 

35.0 

169 

66.4 

30.2 

112 

67 8 

33.7 

134 

69.1 

33.3 

143 

71.3 

31.5 

136 

71.0 

32 3 

136 

63.5 

33.6 

126 


sents chest measurement, both measurements being taken to the nearest tenth of an inch, 
and z represents weight to the nearest pound. Find the coefficient of correlation (a) between 
X and y, (b) between x and 2 , (c) between y and 2 . 

12 . What equation is the equivalent mathematical statement for the following words? 

If the respective deviations in each senes, x and y, from their means were expressed in 
units of standard deviations — that is, if each were divided by the standard deviation of the 
series to which it belongs — and plotted to a scale of standard deviations, the slope of a 
straight line best describing the plotted points would be the correlation coefficient r. 

18 . Given the standard deviations Sj, for two correlated variates x and y, in a large 
sample: 

(a) What is the standard error in estimating y from x if r «= 0? 

(b) By how much is this error reduced if r is increased to 0.25? 

(c) How large must r be in order to reduce to one-half of its value for r = 0? 

(d) How large must r be to reduce to one-third its value for r «= 0? 

14 . Is it true that a correlation coefficient of r == 0.6 indicates a relationship twice as 
close as r « 0.3? 

16 . (For students with some knowledge of analytical geometry.) Show that the tangent 
of the angle between the two regression lines (15.10) and (15.12) is 


tan e 


SxSy 1 — r* 
s** -f r 


and between the lines (15.23) and (15.24) is tan (? = (1 — f^)l2r. 
^ for r « 1 and for r = 0? 


What are the values of 


16 . The marks of a class of 12 students on a Christmas test (x) and on the final examina- 
tion (y) are as follows: 


Student ' 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

X 

41 

45 

60 

68 

47 

77 

90 

100 

80 

100 

40 

43 

V 

60 

63 

60 

48 

85 

56 

63 

91 

74 

98 

66 

43 


Estimate the final mark of a student who obtained 60 on the Christmas test but was ill at 
the time of the final examination. What is the standard error of this estimate? 

Am* 65 , 16 . 
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IT. Specimens of similarly treated alloy steels containing various percentages of nickel 
are tested for toughness with the following results, where x denotes the toughness (in 
arbitrary units) and y*the percentage of nickel: 

X 47 50 52 52 54 56 68 59 60 60 62 64 65 66 

y 2 5 2.7 2.8 2.8 2.9 3.2 3.2 3.3 3.4 3.6 3.6 3.6 3.7 3.8 

Find the coefficient of correlation between the percentage of nickel and the toughness, as 
measured by the test. What is the standard error of estimate for percentage of nickel 
estimated from the toughness? Ans. r = 0.9916, « 0.065. 

16 . The coefficient of correlation between dividends per share in 1935 and the low price 
of the shares during 1935 was found to be 0.817 for 64 American corporation stocks. Is this 
coefficient (a) significantly different from zero? (b) Significantly different from 0.76? 
(o) Significantly different from another sample value of 0.76 calculated from another 
39 corporations? 

Hini. For (b) and (c) transform to the va riable z' and treat z' as normal. 

19 . Establish 96% confidence limits for the slope of the regression line of j/ on ar in Exer- 
cise 3. 

80. Obtain from the chart in the Appendix 95% confidence limits for p, in a population 
of married couples from which the twenty couples in Exercise 3 can be regarded as a random 
sample. Also transform r to the approximately normal variate z' and obtain the 95% confi- 
dence limits from the normal law. 

21 . In Table 59, (page 269) evaluate the following expressions: 

(a) For X = 82, 

ZI/viT) ^yfryy Sm 

(b) For y « 87, 

X X 

22. For the data of Table 68, (page 310) find S, 5, 8^, and r. 

Ans. 2 «= 138.45 lb, y = 67.82 in., Sx = 19.4 lb, Sy « 2.7 in., r * 0.60. 

28. For the data of Table 68, find the equations of the regression lines of x on y and of 
y on X. Ans. Y «= 0.070x -f 58.1; X = 3.502/ - 99.2. 

24 . Compute the value of for the first regression line in Exercise 23. Plot the regres- 
sion line and the approximate 95% band lying along this line. 

25 . Referring to Exercise 8, assume that the data refer to a random sample from a normal 

bivariate population describing the heights and weights of senior men students in colleges 
and universities of the United States and Canada. Determine the probability that Doe's 
height is outside the interval 65.75 — 77.75 in., and the probability that Roe's weight is 
between 100.8 and 122.4 lb. Ans. 0.0027; 0.6 (approx.). 

26 . Prove that a section parallel to the yz plane of the normal bivariate surface, with 
equation (16.60), is a normal curve with its mean on the regression line of y on X and with 
variance - *^^*(1 — p*). 

Hint. Write (16.60) as / « Ke~^y where P ^ (u* — 2puv v*)/2(l — p*), w » af/o», 
V » y/<ry. The trace of the surface in the plane w »= wi is given by putting u « wi. TTicn 
/ - where C - and T ^ (v - piii)V2(l ~ p*) * (y — fn)*l2<r4i,\ where 

m pX!i«ry/ps. 

27. Obtain the best-fitting straight line for the data of Exercise 16 by Wald's method. 
Estimate by this method the standard deviations of the errors in x and y. 




References 


285 


28 . The following quezy and answer appeared in Biometrics BvlUtinf vol. 1, no. 3^ pp. 
38*^7. Investigate the r^erences cited in the answer and justify the procedure which is 
recommended (under the given hypothesis). 

Qwery. A problem that has bothered me is the fitting of regression lines when their 
position is restricted in some way. For example, suppose a test is made of the relationship 
between the number of fish present in a body of water and the average number which can be 
caught out of it, with a standard amount of fishing. In fitting a regression line to such data, 
we know that the point (0, 0) must fall on the line, since if no fish are present certainly none 
will bo caught. In other words, we have one point which is free from sampling error. The 
unique importance of this point will, it seems to me, make observations in its neighborhood 
of relatively less importance than observations at a distance from it, where there is no 
fixed guide-post. Do you know of any treatment of situations of this sort, by which the 
best straight (or curved) Ime could be fitted to data where there is one point which must be 
satisfied? The standard deviation from regression (^‘standard error of estimate") and the 
standard error of the regression would also be available. Or are these concepts pertinent 
in such a question? 

Answer. Doming (§15 and §11 of reference 3, §0.4) gives both a general method and 
some particular solutions of your problem. Snedecor (reference 14) opens his Chapter 6 
with an illustration of the simple case in which x is measured without error and the variance 
of y is constant for all values of x. 

Observations in the neighborhood of (0, 0) may or may not be of less importance than 
those at greater distances; it depends on the variance of y. One often finds that this variance 
increases with x. In fact, there are many situations in which it seems reasonable to suppose 
that in the sampled population the standard deviation of y is directly proportional to x. 
If you think this hypothesis is suitable in your fishing, the appropriate method is to calculate 
the ratios x/y where x is the number of fish caught and y is the total number of fish, then 
apply to them the statistical procedure suitable for a single variate. — George W. Snedecor. 
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CHAPTER XVI 

FURTHER TOPICS IN CORRELATION 

16.1 Reliabflity and Validity of Tests. In educational and psychological 
work, and also in the hiring of employees, considerable use is now made of 
various forms of mental and aptitude tests. It is desirable to know whether 
the results of such tests are (a) reliable^ in the sense that a high degree of 
confidence can be placed in the score made by a candidate, and (b) validy in 
the sense that an individual score on the test actually measures the ability 
or aptitude w^hich it is supposed to measure. The validity of a test is esti- 
mated by the correlation of the score results with an accepted criterion of 
the ability in question. Thus if a test is supposed to indicate aptitude for 
the practice of medicine and is administered to students seeking entrance to 
a medical school, the validity of the test is ascertained by correlating the re- 
sults for a group of students with the actual success of these students in the 
work of the medical school after they have been admitted. 

The reliahiUty of a test is judged by the correlation of the results with those 
of a repetition of the test, or with an alternative form of the test, on the 
same group of candidates. Difficulties arise in practice, however, since it 
cannot be assumed that a person's psychological state either remains constant 
in time or is unaffected by the test itself. There may be a practice effed, 
leading the candidate to do better on a second attempt, or, if the tests are too 
close together, there may be & fatigue effect leading him to do worse. Also, if 
two alternative forms of the test are given, it is hard to be sure that the two 
are of exactly the same standard of difficulty. 

A method which is statistically preferable is to give one test and split it 
into halves, finding the coefficient of correlation r for, say, the odd-numbered 
and the even-numbered items on the test, care being taken to balance these 
items as far as practicable for length and difficulty. The reliability of the 
fxill test is measured by (2r)/(l + r). 

If a; t is the score made by the rth individual on one test (or half-test) and 
Pi is his score on the other test (or half-test), the ordinary coefficient of corre- 
lation is given by r = s^cv/is^Sy), but this is not always the best estimate to 
use for the population coefficient of correlation. If the two scores can be 
supposed to have equal standard deviations (r* and cfy in the population, the 
best estimate of p is 

(16.1) h = 2s,,/ (s.* + «,«) 

and if the two scores can be supposed to have also the same means p, and /i,, 
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the best estimate is 
( 16 . 2 ) I 


- (2 - yy 

2(S,* + 8y*) + (f - 5)* 

2i:xy - (Ej 4- Ey)V2N 
+ T,y^ - (Z* + T.yy/^N 


The estimate (16.1) can be used when we want the correlation between a test 
and a re«test or between two alternative forms of the test. The estimate 
(16.2) is preferable for the split-half method of testing. 


Example 1. In the following table x and y are the scores on two forms of the same test, 
made by a group of 20 pupils (d is the difference y — x). 


Table 62 


Pupil No. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

X 

9 

15 

10 

40 

19 

17 

18 

16 

24 

24 

y 1 

14 

22 

19 

37 

20 

34 

19 

20 

29 

24 

d 

5 

7 

9 

-3 

1 

17 

1 

5 

5 

0 

Pupil No. 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

X 

13 

13 

19 

16 

41 

35 

32 

20 

12 

17 

y 

28 

16 

28 

16 

46 

30 

41 

24 

11 

22 

d 

16 

3 

9 

0 

6 

-5 

9 

4 

-1 

5 

We find X « 20.46, 


26.00, 


85.55, bJ 

=* 80.10, Bxy 

- 68.36, r - 0.826. 

The 

estimate pi is 0.826, so that there 

is very little difference between this and r. 

The estimate 


pa is 0.718. 

The differences of scores on the two forms of a test, or on the two halves of a 
split test, can be used to estimate the reliability of the test, the assumption 
being that the differences are due to errors of measurement. If d « x — y, 
5^* = ~~ 2x2/ + ~ + Xl/*- Also CIld)^/N * 

KZa:)* + 2(Z®) CEy) + CEyWN, so that the variance of the differenoes 
di is given by 

N8/ = Zd® - 

= iV(s.» + - 2s.„) 

Since >» rSgSg, this can be written 

( 16 . 3 ) 8i* == s,* + 8,® — 2ra^f 

=» (s, - 8y)* + 2(1 - r)8^. 

If r => 0 and 8. ■= Sy, 8^* = 2s**, and if r = 1 and s, = 8*, Sd* =• 0. We there- 
fore take ( Jsd*) as a measure of the lack of reliability. In the example above, 
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a 993, 23d « 91, Sd^ » 28.95, » 3.80. This may be compared 

with [(s,® + ai,*)/2l^ = 9.10, which is an average standard deviation of the 
scores themselves. 

16«2 Analysis of Variance of Test Scores. The most satisfactory method 
of dealing with the question of reliability is probably to carry out an analysis 
of variance, which enables us to separate the parts of the variance due to 
individual differences between the students, to the practice effect, and to 
errors of measurement. The practice effect is estimated by the difference 
J? — ig, and the individual effect by the variation between the individual 
mean scores on the two tests. 

If we regard the whole set of 2iV scores as a single distribution, the sum of 
squares of deviations from the mean (called the total sum of squares) is 
given by 

(16.4) Sr = + Ym' - (L® + E!/)V2iV 

and, when divided by the number of degrees of freedom 27^ — 1, this gives 
an estimate of variance of the population of scores. 

The mean score of an individual on the two tests is 2 == (x + 2/)/2, and if 
the two sets of scores are independent random samples from the same popula- 
tion, an estimate of the population variance is given by twice the estimated 
variance of 2 . The sum of squares for individuals is, therefore, 

(16.5) S. = 2(^2* - (E2)V^1 

“ + Yv^ + ^Y^y) ~ + Yyy/^ 

and the number of degrees of freedom is — 1. 

On the same hypothesis as before of independent random samples, the vari- 
ance of 35 or 5 is the population variance divided by iV, so that another esti- 
mate of the population variance is provided by N times the estimate of the 
variance of the means. As there are only two means, their estimated vari- 
ance is [(S)’ + (y)^ - (x + y)V2]/(2 - 1) = (y - J)V2 = (3)V2. The 
sum of squares for the practice effect is, therefore, 

(16.6) Si =. Nily/2 = (Yy - E*)V2iv 

Finally, the error variance is estimated by one-half the variance of the 
differences d y — x, and the sum of squares for error is 

(16.7) S, - (Y^ - (Ydy/N]/2 - Ns,*/2 

with N — 1 degrees of freedom. Since 

s. * (E®* + Yy* - 2E2^)/2 - (Yv - E®)V2N 
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it is easily proved from the foregoing expressions that 
(16.8) /Sr - /S. + & + Ss 

Table 63 gives the analysis of variance for the data of Example 1. On the 
null hypotheses that there are no real individual differences and no real prac- 
tice effect, the ratio of the first and third estimates of variance in column 4, 
and also the ratio of the second and third estimates, are distributed as P. 
From Table IV of the Appendix, we see that the 5% and 1% significance 
levels of F for 1 and 19 degrees of freedom are 4.38 and 8.18, so that the 
observed ratio 207.0/15.2 == 13.6 is highly significant. There is therefore 
a well-marked practice effect. The levels for 19 and 19 degrees of freedom 
are 2.16 and 3.03, so that the observed ratio 159/15.2 = 10.5 is highly signif- 
icant, A large value for this jatio implies a reliable test, as it indicates that 
differences between individuals are large compared with the error. 

Table 63. Analysis of Variance for Scores on Alternative Forms of a Test 

Variation Sum of Degrees of Estimate of 

Due to Squares Freedom Variance 


Individual differences 

3023.5 

19 

159 

Practice effect 

207 

1 

207 

Error 

289.5 

19 

15.2 

Total 

3520 

39 

90.3 


For further details on the reliability of tests, the reader may consult 
Reference 1. 

16.3 Rank Correlation. It is sometimes possible to place a group of 
individuals in order with respect to some characteristic without giving a 
definite numerical score to each individual. For instance, a judge may have 
to rank a group of bathing beauties for a contest, or a sales manager may 
rank a group of salesmen in order of efficiency. The rank is a variate which 
takes (except for possible ties) only the values 1, 2,* * *, A^. The mean 5 is 
therefore (A^ + l)/2, and the variance is given by 

(16.9) 8,* = ^ 

= (N + 1){2N + l)/6 - (iV + 1)74 = (iV* - 1)/12 

Suppose now that the same individuals are ranked in two ways (by different 
judges, or on the basis of different characteristics), and that the rank of the 
tth individual is Xi on the first ranking and yi on the second. If d, — yi -- a:,-, 
we have seen in §16.1 that the variance of d is given, for any pair of variates 
X and y, by 


— 2r8x8y 
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BO that 
( 16 . 10 ) 


Sx* + — grf* 

2siSv 


If X and y are ranks, Si® and s/ are both given by (16.9), and since 2 = j/, 
«/ is given by (Y1(P)/N. On .substituting in (16.10) we get 


( 16 . 11 ) 


(AT® - l)/6 - ('Zd‘)/N 
(N^ - l)/0 

N(m - 1 ) 


This is known as Spearman's formula for rank correlation. It is the Pearson 
product-moment correlation coefficient for the ranks, treated as ordinary 
variates. For fairly small samples, less than 40, say, r is easier to comput/C 
by the rank method than by the exact method and thus is sometimes used 
even when the actual variate values are available. 


Example 2. For the data of Example 1, where x and y now denote ranks on the two 
forms of the same test, we have 


PupU No. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

X j 

20 

14.5 

19 

2 

8.5 

11.5 

10 

14.5 

5 6 

5.6 

y 

19 

11.5 

15.5 

3 

13.5 

4 

15 5 

13.5 

6 

9.5 

cP 

1 

1 

9 

12.25 

1 

25 

56 25 

30.25 

1 

0.25 

16 

Pupil No. 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

X 

16.5 

16.5 

8.5 

13 

1 

3 

4 

7 

18 

11.5 

V 

7.6 

17.5 

7.5 

17 5 

1 

5 

2 

9.5 

20 

11.6 

(P 

81 

1 

1 

20.25 

0 

4 

4 

6.25 

4 

0 


When, as frequently happens, there are ties in the rankings, it is customary to divide the 
corresponding rank numbers equally among the values concerned, using fractions where 
necessary. Thus if the 11th and 12th arc equal they are both given the rank 11.5. How- 
ever, Spearman’s formula is no longer precisely equivalent to the product moment corre- 
lation coefficient, since iVs**, for example, is not equal to NiN^ — 1)/12, if there are any ties 
in the a;-ranking. 

1641 

For the preceding data we find « 273.5, = 20, r » 1 — « 0.794. The 

product moment coefficient for the scores themselves is 0.826. 


The significance of an observed value of r may be estimated from the fact 
that if X and y are the ranks of independent random samples of size N from the 
same population, which need not be normal, the variance of the observed r 
is l/(i\r — 1). If iV is fairly large the distribution is approximately normal. 
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A different method of obtaining a rank correlation coefficient has been given 
by Kendall. (See Part Two, §8.17, and Reference 3.) This coefficient has 
a smaller sampling variance than Spearman^s r, and its distribution tends 
more rapidly to the normal as N increases. 

16*4 Parabolic Regression. In Chapter XIV we discussed the fitting 
of a straight line trend to a time series by the method of least squares, and we 
also dealt with certain curved trends which by a change of variable could be 
reduced to the linear form. Sometimes, however, we need to fit a curved 
trend line which cannot be so reduced, and the simplest curve is the second 
degree parabola. 

For this curve, and in fact for any polynomial, the method of least squares 
gives the same result as the method of moments. If the best-fitting parabola 
is given by 

(16.12) Y == a + bx + cx^ 

the statistics a, b, and c can be calculated from the equations 

Y,y Na + J^xb + Y^x^c 

(16.13) < Y^y = 

. Y^^y = 

which express the equality of the zeroth, first, and second moments of y and Y 
(see §14,8). These equations are obtainable also from the least squares 
condition 

( 16 . 14 ) 2 ( 2 / 

by differentiating partially with respect to a, 6, and c. 

If the values of x are equally spaced, with a common interval h, and if we 
introduce the new variable 

(16.15) w = (x - x)/h 

then, as in §14.11, the u are consecutive integers if N is odd, or half-integers 
differing by unity if N is even. In either case 2^* ^ N{N^ — 1)/12 == m, 
say, and 2^' ^ N{N^- l){Zm - 7)/240 = n, say. The equations for 
a, bf and c are now much simplified. They are 

’ Na + me ^ Yy = ^y 

(16.16) ^ nib = 2^2/ 

, ma + nc = 2^*2/ 

When N is even, it is convenient to double the u values so as to avoid fractions. 
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Example 3. Table 64 gives the number of divorces per 1000 marriages in the United 
States, 1900-1930. Fit a parabolic trend line to these data. 

Here iV « 7, m « 7(48)/12 « 28 

n - 7(48)(140)/240 « 196 

(The values of m and n are checked by the column totals and 52^^) • The equations 
for a, &, c are 

7a -f 28c « 808 
286 «= 465 
28a + 196c = 3415 

From the second of these, 6 = 16.607, and, from the first and third, a « 747/7 « 106.71, 
c « 61/28 ** 2.1786. The regression equation is therefore 

Y = 106.7 + 16.6w -f 2.18u2 

where u (x — 1915) /5. Computed values of Y are given in the last column of Table 64. 
Table 64. Divorces per 1000 Marriages, U.S.A., 1900-1930 


Year 

V 

u 

..... 

uy 

w* 

u^y 

u* 

Y 

1900 

79 

-3 


9 

711 


76.5 

1905 

81 

~2 


4 

324 


82 2 

1910 

88 

-1 


1 

88 


92.3 

1915 

104 

0 

0 

0 

0 


106.7 

1920 

134 

1 


1 

134 


125.5 

1925 

148 

2 


4 

592 


148.6 

1930 

174 

3 


9 

1566 

81 

176.1 


808 


465 

28 

3415 

196 



Source: Statidicol Abstract of the United States ^ 1951. 


Fig. 72 shows the fit of the curve to the data and also illustrates very well the 
dangers of extrapolation. The actual values for 1935 and 1940 fall a long way 
below the trend line, whereas the value for 1945 is fairly close. 

The geometrical meaning of the constants a, 6, c is indicated in the diagram. 
We see that a is the ordinate at w == 0, 6 the slope of the tangent to the curve 
at w = 0, and c the difference of ordinates (at w «= 1) between the curve and 
the tangent at u = 0. If the curve is concave upward, c is positive; if concave 
downward, c is negative. 

16.6 Correlation Index for Non-linear Regression. We have seen in 
§15.6 that when the regression is linear the Pearson coefficient of correlation 
r is given by 

r* « 1 - 
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Fig. 72 . Parabolic Trend Line 

where sy^ is the variance of the computed F^s and Sy^ that of the observed 
and where Sey^ is the variance of the y's about the regression line. 

When the regression is curved, we may define a correlation index r^ by the 
expression = Syy/isySy) and show that 

(16.17) = SY^/Sy^ - 1 - SeyVSy^ 

where now the F^s are given by the equation of the best-fitting curved line, 
and is the variance of the observed y's about this line. The value of the 
correlation index depends, of course, on the particular trend line chosen, but 
for a given curve it is an indication of the closeness with which the observed 
points cluster around this line. 
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For a parabolic trend, the sum of squares of deviations from regression is 
Ns,y^ = ^(y — a — bu — cu®)* 
and, as in the proof of Theorem 4, §15 6 , we can show that 

(16.18) — aXl/ ~ 

Since JVsy^ = — (^yY/Ny wc have ‘from (16.17) 

= {NsY - Ns,Y)/NsY 

= “■ (HvY/N ■+ 6X^2/ + 

On substituting for a from the first equation of (16.16) this becomes 

(16.19) [bYiuy + cC£,u^y — mYl,y/N)\/N8Y 

For the data of Table 64, we have 

- (HyY/N - 101498 - 93266 - 8232 
= 8121/8232 = 0.9865 

so that Tc = 0.993. If we fitted a straight line to the data, we should find 

= (EuyY/mNsY = 0.9381, r = 0.969 

The fit about the parabola seems to be definitely better than that about 
the straight line. This is confirmed by analyzing the variance. The total 
sum of squares for y about y is NsY = 8232. The sum of squares about the 
straight regression line is Nsy'^O =: 510 and that about the parabolic 

regression line is — Vc^) = 111 . The reduction due to the parabolic 

regression is, therefore, 399. 

The significance of these sums of squares can be estimated by an analysis 
of variance, as in Table 65. In making an estimate of population variance 
about parabolic regression we divide the sum of squares hy N — 3 (instead of 
by iV' — 2 , as for linear regression). Three degrees of freedom are lost since 
three constants for the parabola are estimated from the data. The estimated 
variance is, therefore, 111/4 = 28. Since the reduction in sum of squares 
due to the parabolic (over the linear) regression is 399, with 1 degree of free- 
dom, the ratio 399/28 == 14.4 is to be compared with the 5% and 1% values 
of F with 1 and 4 degrees of freedom. These values are 7.71 and 21,2 so 
that there is a significant reduction. That is to say, parabolic regression is 
definitely better than linear regression at the 5 % level of significance. 
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Sum of Squares 

Degrees of Freedom 

Estimate 6f 
Vananct 

Total {Nb,*) - 8232 

6 

1372 

Linear regression (AT sy*) — 7722 

1 


About straight \ 

regression line / 610 

5 

102 

Parabolic regression {N 5f*) « 8121 

2 


About parabolic \ ... 

regmJonUne ) (A^***) “ 

4 

28 

Reduction due to 1 
parabolic regression / 

1 

399 


16.6 Curves of Column and Row Means. When we are dealing with 
data grouped in a correlation table, the general nature of the regression may 
be estimated by plotting the column (or row) means and joining them by 
straight lines. We saw in §15.11 that the ordinary regression line of ^ on a; 


m 

Bi 

1 




■ 

0 

0 

HHQHHH 

i 

■ 

1 

0 

1 


■ 

B 

0 



0 

0 

2 

fx 

1 

2 

1 

4 

8 


Fiq. 73 


pves the best-fitting straight line to the column means (weighted with the 
column frequencies), but of course the line of column means might show a 
well-marked departure from linearity. A simple example is provided by the 
table of Pig. 73, where the column means are indicated by black dots and 
the row means by crosses. The dots lie on a straight line, but the crosses 
do not. 
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The variance of the y values in a column about the corresponding column 
mean is defined by 

(16-20) s,,5* = j - yzY 

Jx y 

where = Ylvf^v/fx = 2/0 + kVIfut in the notation of §15.10; then, by 

y 

analogy with the definitions of r and r*, we can define a correlation ratio 
by the relation 

(16.21) 1 - = CLfzSy.i‘)/Nsy^ 

X 

If the column means lie on a straight line, is the same as for linear 
regression and then is the same as r2(= 1 — In general, Ej^J 

is greater than r^, and the greater the departure of the line of column means 
from linearity, the greater the difference. 

A similar expression for (16.21) can be written for F^ru^, where F^^u is the 
second correlation ratio, which depends upon the scatter of the observations 
about the line of raiv means. Although r^y is always equal to this sym- 
metry is not characteristic of the correlation ratio, and in general E^u is dif- 
ferent from Eyx> 

16.7 Calculation of Correlation Ratios for Grouped Variates. We first 
prove that 

(16.22) Eyx^ = 

which may be compared with (15.28) and shows that EyJ^ is the ratio of the 
variance of the column means to the variance of the The numerator 
is defined by 

(16.23) Nsy^ = 'Em, - yy 

X 

To prove (16.22) we need to show that 

(16.24) Nsy = + Nsi' 

X 

and this follows from Theorem 3 of §6.12. It is, in fact, merely equation 
(6.17) in a new notation. Dividing (16.24) through by Nsy^, and using 
(16.21), we obtain 

1 - 1 - .E..* + By*/S* 

which is equivalent to (16.22). 

In the notation of §15.10, x and y are replaced by « and v, where x = x. + hu, 
y yt + kv. Then 


«,* «= 
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wd 

isrv - - v)* 

u u 

but T,Uiv/f^ - vy » E(^//-) - 

so that 

(16.26) Ey» = [Z(F*//«) - mms.* 

This is the formula used to calculate Ey^. The values of V and of /„ are 
required in the ordinary process of finding r for a correlation table. We need 
merely another row along the bottom of the table (see Table 60 of §15.10) 
giving values of V^/fu. If the column means are to be plotted we need also 
V/fu and a row for this can conveniently be provided first. The values of 
V^/fu are then given by multiplying V/fu by V. 

Example 3. For the data of Tables 59 and 60 (pages 269, 272) we have 



J&y* « [93,29 - (0.54)n00J/180.8 *= 64.13/180.8 « 0.365 
E,m =» 0,596 

This is a little greater than the value of r « 0.58 found previously, but probably not 
enough to indicate a significant departure from linearity. A method of judging the signifi- 
cance of this difference will be given in the next section. 

The formula corresponding to (16.25) for the second correlation ratio is 


(16.26) E.y^ = iZCUV/v) - 

Nm/NsJ 



which is the same as with u and 

/. 

U 

viu 

cr»//. 

V (and U and V) interchanged. 

7 

11 

1.67 

17.29 


18 

24 

1.33 

32.00 

For Table 60, we have 

28 

-16 

-0.636 

8.04 

- [126.11 ~ (0.28)*100]/278.2 » 0.422 

23 

-18 

-0.783 

14.09 


18 

-14 

-0.778 

10.89 

Emg ** 0.649 

6 

-13 

-2.60 

33.80 

which suggests a greater departure from 

1 

-3 

-3.00 

9.00 

linearity for the row means than for the 





column means. 

100 

-28 


126.11 


Just as the correlation coefficient r for a sample is an estimate of the true 
correlation coefficient p for the population from which the sample is taken, 
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so the correlation ratios are estimates of the true ratios and Tables 
have been prepared from which the significance of observed values of Ey^ or 
Exu can be estimated. For further details and references, see Part Two 
(Chapter XI). 

It may be well to mention that the value of Eyx is not independent of the 
classification of the data. As the class intervals become narrower, Eyx 
approaches unity. This may be understood from (16.21). If the grouping 
v'ere so fine that only one item appeared in each column, then it would con- 
stitute the mean of that column. In this case Sy^y^ would be zero and Eyx 
would therefore be unity. On the other hand, a very coarse grouping tends 
to make the value of Eyx approach r. Student has given a formula for The 
Correction to be Made in the Correlation Ratio for Groupitig in Biometrika, 
vol. IX, pp. 316-320. 

16.8 Test for Linearity of Regression with Grouped Variates. The 

weighted sum of squares ior column means, (16.23), can be split up into a 
part depending on linear regression and a part depending on the deviation 
from linear regression, and this circumstance enables us to use the F test for 
the significance of the deviation from linear regression. 

Let Yx be the value of the computed y for a column, from the linear re- 
gression 

Yx — Cl + bx 

Then = 

so that 

(16.27) TUvr - vy = - Y>y 

+ ZUY. - VY + 2ZS.{y. - FxXF. - p) 

By (16.22) and (16.23), the left-hand side of (16.27) is equal to Nsy^EyY. 
Since p = o -f i»x, 

Hf.iY, - yY = - 2Y = NbW = NsYt^ 

Also, the last term on the right-hand side of (16.27) vanishes, as may be 
shown by writing 

262I/»(yx — a — hx){x — J) 

= 2f)(X^x/,|?, - aZ,xf, - bH^fx) 

- ‘2hz(ZSxV. - oE/. - 

« 0, by (15.63) and (15.64) 
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a aad b being the same as the ai and hi in those equations. Therefore we have 
from (16.27) 

(16.28) Z/*(5. - F.)» = 

= - r®) 

This is the part of the sum of squares for column means which is not accounted 
for by linear regression. If this is large, compared with what we might 
expect from random sampling variation, we reject the hypothesis that the 
regression is really linear. 

The comparison is made with the sum of squares within the columns, that 
is, with which, by (16.21), is equal to Nsy^{l — 

The number of degrees of freedom for variation in a column is — 1, and, 
if there are p columns, the total number of degrees of freedom for the varia- 
tion within columns is — 1 ) = N — p^ Also, since there are p column 

X 

means fitted with a linear regression line, the number of degrees of freedom 
for variation from regression is p — 2. Ilf can be shown that, if the 
parent population is uncorrelated, the ratio of [Nsy^{EyJ^ — r^)]/(p — 2) to 
INsy^il — Ey/)]/(N — p) has the F distribution with p — 2 and N — p 
degrees of freedom. A significant value of F indicates a significant departure 
from linearity. 

For the example given before, 

Ey^^ = 0.355, r* « 0.337, N - 100, p == 7 

80 that 

F *= (0.018)93/1(0.645)5] - 0.52 

This, being less than 1, is clearly not significant. For the other line we 
can use the same expression for F but with E^y^ instead of EyJ^ and with 
q (the number of rows) instead of p. For the same example, E^J^ =*= 0.422, 
r® « 0.337, AT = 100, ^ = 7, so that 


F = (0.085)93/1(0.578)5] = 2.74 

The 6% level for 5 and 93 degrees of freedom is about 2.31, so that there is 
a significant departure from linearity in the curve of row means, that is, in 
the regression of x on y. 

16.9 Some General Remarks on Correlation. The relationship between 
the correlation coefficient and the correlation ratio may be clarified by Fig. 74. 

For completely random scattering of the dots, with no trend, r and E are 
both zero {E stands for either Eyz or Exv)^ If the dots lie precisely on a 
straight line, r « 1 and £' = 1. If the dots lie on a curve as in Fig. 74 (c), 
such that no ordinate cuts it more than once, Eyg « 1, and if, furthermore, 
the dots are symmetrically placed about the p-axis, Exy = 0 and r « 0. 
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In Fig. 74(d) the dots scatter around a definitely curved trend line, and 

Ey^ > T. 

Although statistical theory gives a description of the indicated relationship 
between two related variables, the interpretation of the results '‘abounds in 
pitfalls easily overlooked by the unwary, while they are cantering gaily along 
upon their arithmetic.^’ 




(c) f*»0 

Exy =0 



Fig. 74 


The methodological side has been developed until we can find correlation coefficients by 
simply turning a crank, but the explanation of the meaning of the result after we find it, 
needs a brain • * • No amount of mathematical training and ability can take the place 
of the judgment and common sense that comes from a knowledge of the field in which the 
problem lies (Reference 3.) 

In the interpretation of r one should avoid imputing any causal relation- 
ship between the variables. In this connection the following pungent remarks 
of Professor E. B. Wilson (Reference 4) may be appropriately quoted: 

Correlation is a mutual affair between two numerical variables; the correlation coefficient r 
is symmetrical with respect to them. Strictly, y is not correlated with x ot x with y, but 
X and y are correlated. Theory is very important in indicating what facts should be looked 
for as significant; facts are significant or important largely as they indicate theory, but 
neither compels the other, as the histories of theorizing and of fact finding amply demon- 
strate • • • Further, the value of the correlation coefficient depends on the group for which 
it is determined or on the universe of which that group is a fair sample. The correlation 
coefficient r of height and weight for a group containing humans from infancy to adult life 
would be different from, and in fact greater than, the coefficient for college students or for 
the members of a football squad ; there is no such thing as the correlation coefficient per 
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18.10 Contingency. Frequently in biological or psychological experi- 
mmtal work we encounter characteristics or attributes which are not suscep- 
tible of accurate measurement, although it is possible to divide the population 
into two or more classes with respect to these attributes. We might, for 
example, divide a population into *‘right-handed,^^ “left-handed,^^ and “ambi- 
dextrous,'^ or into “fair haired," “red-haired," “brown-haired," and “black- 
haired." A frequency table in which a sample from the population is classified 
according to two different attributes, is called a contingency table. It is like 
a correlation table, except that the different columns and rows are not assigned 
definite numerical values of variates x and y. 


Table 66. Contingency Table 




at 

B, 


Ai 

/a 

fu 


n 

A, 

hi 

Jn 

fit 

r 2 

A, 


fn 

fll 

1 8 

Ai 

/« 

u 


r4 


Cl 

Cl 

Cl 

1 

N 


If we have two attributes A and B and if the sample is divided into four 
A-categories, Ai, A 2 , A 3 , and A 4 , and into three ^-categories, Si, J52, and 
we shall have a contingency table like Table 66 Here is the frequency 
of individuals in both the categories At and r, is the marginal frequency 
of the At (the tth row total), c, is the marginal frequency of the Bj (the jth 
column total), and N is the total number in the sample. 

If the attributes A and B are independent, the probability that an individual 
has the attribute A , is the same for all categories B, and therefore is the same 
as the probability of A * in the population as a whole. The proportion of 
individuals in the Bj category with attribute A » should therefore be approxi- 
mately the same for all values of j and for the right-hand margin, the differ^ 
ences actually observed being merely sampling fluctuations. A similar 
statement may be made about the proportion of individuals in the A * category 
with attribute Bj. For Table 66, these statements mean that, for all i and j, 

fii/Ci « Vi/N 

fti/u « Ct/N 
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and both are contained in the relation 

(16.29) Sii « TiPj/N 

(The symbol « means “approximately equal to”.) 

Example 4. Table 67 gives some data on hair color and eye color for 6800 males from 
Baden (in Germany). Is there any association between these two attributes? 


Table 67. Hair Color and Eye Color 


Eye Color 

Hair Color 



Fair 

Red 

Brown 

Black 


Blue 

B 

47 

807 

189 

2811 

Gray or Green 

946 

53 

1387 

746 

3132 

Brown 

B 

16 

438 

288 

857 


2829 

116 

2632 

1223 

6800 


One method of judging whether association is present is to compare the proportions, say, 
of black-haired people among the blue-eyed and among the brown-eyed, regarding these 
two classes as independent random samples, and to test whether the difference of propor- 
tions is significant according to the criterion given in Chapter XI. 

These proportions are 189/2811 and 288/857. We can take, as an estimate of the popu- 
lation proportion of black-haired men, the ratio 0 == 1223/6800 in the whole sample. The 
variance of the difference of proportions in independent random samples of sizes 2811 and 
857 is 

The actual difference of proportions is 0.2688 and the ratio of this to its standard devia- 
tion is 0.2688/0.0150 « 17.9. Obviously the observed difference is far too large to be 
accounted for as a sampling fluctuation and the conclusion is that black hair and brown 
eyes are definitely associated. Other proportions from the table may be treated in the same 
way. There is also, however, a method of testing for association in the table as a whole 
and this metliod is explained in the next section. 

16.11 Chi-square Test for Association. On the assumption that the 
marginal distributions in Table 66 are fixed, the distribution among the cells 
of the table has 6 degrees of freedom. In any row there are 3 cells, but, with 
a fixed r», only 2 of these can be filled arbitrarily, the frequency in the third 
being then determined. Similarly with the columns, only 3 cells in each are 
freely adjustable. It can be proved that, under these conditions, and on the 
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null hypothesis that A and B are independent, the quantity 

X.’ = E(/.-y - <#>.,) 

V 

has approximately the x* distribution* with 6 degrees of freedom, being 
the expected number in the tth row and jth column, given by 

(16.30) 4,,, = r,Ci/N 

In the general table with r rows and c columns, the number of degrees of 
freedom is (r — l)(c — 1). 

By definition, 

X.* = E(/o)V<^o - 22:/.y + 

%j *3 t3 

But = N, 

SO that 

(16.31) X.* + N’= = NZifi>)yriCj 

y 

This is a convenient formula for calculating Xf®- For the data of Table 67, 
we have 

Xs^ (1768)^ (946)^ 

2829(2811) 2829(3132) ' 

(288)^ 

1223(857) 

= 1.158 

so that Xf^ = 1075. This is, of course, a very large value for 6 degrees of 
freedom, and the probability of obtaining as great a value on the null hypoth- 
esis is practically zero. The null hypothesis is therefore decisively rejected. 

16.12 Coefficient of Contingency. Karl Pearson proposed as a measure 
of the association in a contingency table the coefficient 

(16.32) C= [x.V(x.^ + A^)l’^ 

which he called a “coefficient of mean square contingency.” The larger 
X«^, the nearer C is to 1, and the greater the degree of association. However, 
C can never be equal to 1 even if there is perfect association between the 
attributes, and the maximum value of C depends on the number of rows and 
columns in the table. For a 4 X 4 classification, for instance, the greatest 
possible value is 0.866. The utility of this coefficient is therefore rather 
doubtful. For the data of Example 4, 

C = [1075/7875]^ - 0.37 


* Compare equation ( 13 . 1 ). 
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IS.IS The 2X2 Table, The commonest type of contingency table in 
practice is that with 2 rows and 2 columns, generally known as a ‘‘two-by-two 
table/^ The population is divided into two A-classes and also into two 
jB-classes, The number of degrees of freedom, with the marginal totals 
fixed, is therefore only 1 . An example of such a table was given in §9.2, 
Table 32, to illustrate relative frequencies, the i4 -classes being “inoculated'' 
and “not-inoculated," and the B-classes “attacked by sickness" and “not- 
attacked." If the frequencies in the four cells are denoted for convenience 
by a, 6 , c, d, the value of for the table is 
given by 


(16.33) X.* - N{ad ~ &c)V(WiC 2 ) 


At At 


To prove this, we note that since the ob- 
served and expected frequencies have the 
same marginal totals, the difference between 
the observed and expected frequencies in a 
cell is the same for all cells, except for sign. 

If the expected frequencies are a, jS, 7 , 5, 
corresponding respectively to a, 6 , c, d, then 
a + /3 « a + 6 « fi, so that o — a « — (6 — 


Bt 

Bi 


jS) and similarly for the other 


a 

b 

n 

c 

d 

rt 

Cl 

cz 

N 


pairs. 

Now a «= fiCi/N, so that 


(a + ft)(q + c) _ ad --he 
a+b+c+d a+b+c+d 


{ad - bc)/N 


By definition, 


^ (a ^ «)2 ^ (5 ^ 0)2 ^ ,y)2 (d ^ 5)2 

X« “I "b "T" - 


and, as we have just seen, all the numerators are equal to (ad — bcY/fP, 
Therefore 


X.^ 


+ 1 + 1 + 11 

m La ^ ^ 7 «J 

M^T-L + i- + ± + ±] 

N Lnci nct r*Ci r2CjJ 


(ad - be)* 

NriTiCiCi 


[rjCj + rjCi + nc, + nci] 


(ad - be)* 

NriTiCiCt 


[rjAT + nN] 


(ad - bc)*N 


nriCiCt 
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If we denote by 4% the difference in proportions of Bi in the categories Ai and 
ill, and by 4% the difference in the proportions of At in the categories Bt and 
J5|, then 

, , , , a b ad — be 

di «= a/ci - b/c 2 = — 7——: 

a + c 6 + d CiCt 


so that 


da « a/ri — e/ri — 


a c ad — be 

a + b c + d TiTt 


(16.34) 


X.* - Ndi<h 


This indicates clearly how depends on the degree of association between 
the two attributes A and B. 

For the data of Table 32, §9.2, 

X.^ = 20(44)V(8)(12)(13)(7) « 4.43 

and the probability of a value of x^ at least as great as this, with 1 degree of 
freedom, is a little less than 0.04, so that the degree of association indicated 
is apparently significant. 

16.14 Yates* Correction. The distribution of x* is continuous, whereas 
the distribution in a contingency table is discontinuous, the cell frequencies 
being necessarily integers. The approximation of the x** distribution to a 
X* distribution is better as N increases, but for moderate values of N the 
approximation is, as a rule, much improved by a correction due to Yates. 
This correction is analogous to that used in approximating the sum of terms 
of a binomial distribution by the integral of a normal curve (see §11.1), 
where the sum from, say, x = atox = 6is approximated by the integral 
from a — J to b + J. In the 2X2 table the correction consists in replacing 
the frequency d by d + 4 if ad < be or by d — J if od > be, the remaining 
frequencies being adjusted accordingly, so as to keep the marginal totals 
unaltered. In the foregoing example, the table, corrected and rearranged, 
is as shown. The effect of the correction is to replace {ad -'bey in the 
calculation of x«® by (|ad — bc| — iV/2)*, 
and in this case the reduction is from (44)* 
to (34)*. (This may be checked by noting 
that 

34-24 - 44-94 » 4(35 - 171) = -34). Bi 

The new value of x»* is 2.65, and the proba- 
bility of a value of x* as large as this is 0.104. 

The correction has therefore changed a 
probability which is significant at the 5% level to one which is non-significant. 
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16*16 Fisher’s Exact Method for 2 X 2 Tables. When the frequencies are 
fairly small, as in the preceding example, the probabilities of the various 
possible arrangements of the table, with fixed marginal totals, can be cal- 
culated exactly. The probability, in fact, of the arrangement with frequencies 
a, b, c, d in the four cells, is given by 

(16.36) p = {Ti!rJci!c^!)/{a!h!c!d!N!) 

Thus, for the data of Table 32, there are eight possibilities which may be 
set out as follows: 


1 

7 

2 

6 

3 

5 

4 

4 

12 

0 

11 

1 

10 

2 

9 

3 

(1) 

(2) 

(3) 

( 4 ) 

.5 

3 

6 

2 

7 

1 

8 

0 

8 

4 

7 

5 

6 

6 

5 

7 

( 6 ) 

( 6 ) 

(7) 

( 8 ) 


Of these, number (3) is the one actually obtained. The probabilities for the 
eight tables are given by 


Table 

( 1 ) 

(2) 

(3) 

( 4 ) 

( 6 ) 

(6) 

(7) 

( 8 ) 

9690p 

1 

42 

462 

1925 

3465 

2772 

924 

99 


(being all multiplied by the common factor 9690 to avoid fractions). 

The probability of the arrangement (3) and of all more unlikely ones in 
the same direction (that is to say, with deviations from expectation in the 
same direction) is 


1 + 42 + 462 
9690 


505 

9690 


= 0.052 


This corresponds to one tail of the distribution, whereas the probability 
calculated from corresponds to both tails.* If the Fishei probability, as 
calculated previously, is multiplied by two, we get 0.104, which is very close 
to the probability for x* as obtained after applying Yates' correction. 

• By definition, the quantity x«* depends on the sqiLares of the differences a — a, etc., and 
so is always positive. For distributions of cell frequencies near one extreme, the values of 
a — a, etc., will be opposite in sign to the corresponding values for distributions near the 
oth^ extreme, but both will usually give comparatively large values for 
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If the smallest frequency in the table is above expectation, instead of below 
it, the **tair* will correspond to even higher values. Thus, for the 
table alongside, the expected value corresponding to the observed 
frequency 4 is 121/84 = 1.44. The probability of the observed an- 
rangement and of more unlikely ones in the same direction is, there- 
fore, the sum of the probabilities for tables with 4, 5, 6, 7, 8, 9, 10, and 11 in the 
lower right-hand cell, and this is about 0.0336. The corrected value of is 
3.90, which corresponds to a two-tailed probability of 0.048. The poor 
agreement in this case with the exact value is not surprising in view of the very 
small expected frequency in one cell. With the smallest expected frequency 
less than 10, the exact method should as a rule be used, or alternatively, a 
table given in Fisher and Yates^ Statistical Tables (Table VIII) will give an 
idea of the significance of the observed 

It may be noted that when there is only one degree of freedom for x^ the 
distribution of x (the square root of x^) is normal. In the example of §16.14 
the value of after applying Yates^ correction, is 2.65, corresponding to 
X« = 1.63, and the probability of a value greater than 1.63 for a standard 
normal variate is about 0.052. The probability obtained from a table of 
X® is double this, because if x* > 2.65, then either x > +1.63 or x < —1.63, 
and the sum of the probabilities for these alternatives is 0.104. 

16.16 Problems Involving Three Variates. If we have three variates 
Xf y, and z, which may be mutually related, the problems of correlation and 
regression become much more complicated, and in this book we can only 
touch on them very lightly. If we naturally think of 2 as dependent on 
both X and y, we can fit by least squares an equation of the form 

(16.36) Z ^ a + bx + cy 

and the technique is very similar to that of fitting a parabola in the case of 
two variables. The equations for finding o, 5, and c are' 

« A^a + ^xb + X]yc 

(16.37) Y^xz =» + Yl^b + J^xyc 

. + H^y^ + 

The dot diagram will be a three-dimensional affair and the assumption in 
choosing an equation of the form (16.36) is that the dots lie more or less in 
a plane, scattering above and below the plane in a direction parallel to the 
2~axis. 

There wiU be three ordinary correlation coefficients between the three 
variates, namely, r*^, r^,, and but there are also partial and multiple 
correlation coefficients. The multiple correlation coeficierU oi z on x and y, 
denoted by is the ordinary correlation coefficient between the observed 


66 7 

7 4 
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values t and the computed values Z as given by (16.36). Its square repre- 
sents the part of the total variance of z which is explained by the regression 
of z on z and y, and it may be proved that 

(16.38) 

(t^ xt “i" xyf*yiti' I (1 r® »y) 


The partial correlation coefficient of two variates x and y is defined as the 
ordinary correlation coefficient of x and y when the influence of the third 
variate z is eliminated. This influence is eliminated by subtracting from x 
the estimated X due to the regression of x on z, and similarly subtracting from 
y the estimated Y due to the regression of y on z. That is 


(16.39) 


I X, = X - f — r„s,(z — z)/8. 

i I/. = y — P — ry.fiy(z — z)/«. 


and the partial coefficient of correlation of x and y (denoted by rgy.,) is then 
the ordinary coefficient for x, and y,. It can be proved that 


(16.40) 




^ ay ^ yt 

[(1 - r..*)(l - ry\)]^ 


The first step in calculating multiple and partial correlations is therefore 
the calculation of the ordinary correlation coefficients between each pair of 
variates. We shall not carry the subject any further here, and for questions 
ot significance and the extension to more variables, we refer the student to 
the relevant sections of Part Two. 


Exercises 

1. Verify equation (16.8) from the expresaions for S», and 5,. 

8. A group of 28 atudents is given a one-hour test in mechanics shortly before Christmas 
and another similar test in February. If x and y represent scores on the two tests, 2 sc “ 
1092, E - 62,260, El/ - 1796, El/* - 126,611, E ®1/ - 76,628. Make an analysis 
of variance of the data, on the lines of Table 63, §16.2, separating the variation into the part 
between students, the part between the two tests, and the part attributable to error. (The 
variation between .the tests is what corresponds to the ‘'practice effect'* in Table 63. It 
depends on the difference of difficulty between the tests as well as on the effect of increased 
knowledge of the subject and practice in writing examinations.) 

8. If * jc -f- 1/, write out a formula equivalent to (16.10) involving «**. What does this 
formula become when x and y are ranks? 

4 . Twelve salesmen are ranked in order of merit for efficiency by their manager. They 
are also ranked in accordance with their length of service. What indication is there of a 
relation between length of service and efficiency? (Gorrstt.) 
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Salemen 

A 

B 

C 

D 

£ 

F 

G 

H 

I 

J 

K 

L 


F«ar« 0/ 
Service 

5 
2 

10 

8 

6 

4 
12 

2 

7 

5 
9 
3 


Order of 

Order of Merit Merit 

(Service) (Effic.) . 


7.5 6 

11.6 12 

2 1 

4 9 

6 8 

9 6 

1 2 

11.6 10 

5 3 

7.6 7 

3 4 

10 11 


An«. r ■■ 0.80. 


6. Find Spearman’s r for the following data: 



Rank (x) 

Score (x) 

Rank (y) 

Score (y) 

A 

1 

92 

2 

88 

B 

2 

89 

4 

86 

C 

3 

87 

1 

93 

D 

4 

86 

6 

79 

E 

6 

83 

7 

70 

F 

6 

77 

3 

87 

G 

7 

71 

9 

62 

H 

8 

62 

6 

84 

I 

9 

63 

10 

41 

J 

10 

40 

8 

64 


Ana. r » 0«738. 

6. Calculate the Pearson coefBcient of correlation for the scores in ExerciM 5> and com* 
pare with the Spearman coefficient for the ranks. 

7. Two judges rank seven candidates in a beauty contest as in the following table: 

ConteetarU Judge! Judged 

A 2 3 

B 1 4 

0 4 2 

D 6 6 

E 3 1 

F 7 6 

G 6 7 

Compute the correlation coefficient between the two rankings. Assuming that for a 
sample of 7 pairs drawn from a population of values of independent variates x and the 
computed rank correlation coefficient will exceed 0.714 in not more than 6% of cases and 
will exceed 0.893 in not more than 1 what conclusion regarding the judges may be drawn 
from the above data? 
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ft* fit a parabola to the following data: 



0 

1 

2 

3 

4 

6 

6 

y 

2 

8 

12 

13 

12 

10 

8 


Calculate the correlation index rr. Am. y * 2 52 -f- 6.07x — 0.881 x*; n « 0.982» 


ft. In the following tabic, x represents age in years for males, and y the mean vital capao^ 
ity (Holzmger, Bfomeinka^ 16, 1924, pp. 141-2). 


X 

y 

X 

y 

X 

V 

19.5 

227 

37 5 

222 

55 5 

201 

22.5 

230 

40 6 

218 

68.6 

185 

26.5 

230 

43 5 

216 

61 5 

200 

28.5 

237 

46 5 

210 

t*)4 5 

169 

81.5 

227 

49 5 

205 

67 5 

160 

34.6 

229 

52 5 

193 1 

70.5 

103 

Find the equation of the best-fitting parabola, and the correlation index. 

Am y = 218.0 -1- 1.231X - 0.02935x»; r, - i 

10. Table 68 gives data on heights (?/) and weights {x) of 200 freshmen. 

(a) the two means and tiie two standard deviations, 

(b) ‘the regreseton line of height on weigfit, 

Calculate: 


0.968. 


Table 68. Heiguts and Weights of 200 Freshmen 
(Heights to Nearest ^ Inch; Weights to Nearest J Pound) 


X 

90- 

99.6 

100- 

no- 

120- 

130- 

140- 

150- 

160- 

170- 

180- 

190- 

200- 

209.5 

Sy 

78- 

77.9 


1 

1 

1 


1 



i 

i 



- .i 


1 

74- 







1 ' 

1 

1 

1 

1 

1 

1 

i 


4 

72- 




1 

1 

1 

4 


1 

i 

1 


8 

70- 


1 

1 


6 

7 

6 

2 

1 

I 

1 2 

1 

1 

29 

«»- 



2 

8 

17 

8 

i 

2 

1 

1 

1 

1 


49 

eft- 



8 

16 

14 1 

i 


i 

2 

1 



1 

61 

64- 


3 

8 

7 

7 

3 

3 

1 

1 




33 

62- 

1 

4 

1 

7 

1 








14 






i 








0 

ftft- 

W.9 


1 

I 

1 










1 

/. 

j 1 

i : 

8 

20 

42 

46 

1 32 

i 

1 29 

1 ^ 

1 

» 

4 

2 

2 

aoo 
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(c) the legmaion line of weight on height^ 

(d) the Pearson coefficient of correlation between height and weli^t, 

(e) the correlation ratios, 

(i) the column and row means. 

Ans, (a) £ « 138.45 lb, p « 67.82 in., «* « 19.4 lb, Sy » 2.74 in., 
(b) r 0.070X -f 58.11, (c) X -» 3.603y ~ 99.15, 

(d) r = 0.50, (e) « 0.55, £^xv « 0.53. 

11. For Exercise 10, plot the straight regression lines and the lines of column and row 
means. Test the significance of the deviation from linearitj* in both cases. 

12 . Table 69 shows for male lives the correlation between the age (x) of an insured pmon 
at the time of issue of a policy and the age (^) of the insured at death. (Data of Midland life 
Insurance Company, 1906-1924; see Ref. 5.) 

Find Eyx^ Exy, and r, and test the significance of the departures from linearity for the 
curves of column means and row means. 



It* In the accompanying contingency tGd)le, x represents a rating given to each of a group 
of university freshmen on the basis of high school reports and y represents the final standing 
in degree examinations for the same group. Discuss the assodation between these two 
attributes. 
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fair 

good 

excdlmt 

8rd class 

73 

67 

10 

2nd class 

64 

84 

15 

Ist class 

5 

24 

28 


14 . In Table 70, x represents number of inches of water applied by irrigation to a crop 
and y represents the crop yield in bushels per acre. The numbers shown in the table are 
class marks of the various classes. Find the equation of regression of y on x and test whether 
it departs significantly from linearity. 


Table 70. Chop Yield and Ikriqation 



Id, In a public opinion survey, the following questions were asked: 

(1) Do you drink beer? Local Option 

(2) Are you in favor of local option on the sale 
of liquor? 

The results in one district were as shown in the 
accompanying table: Does this table provide 
good evidence of an association between drinking 
habits and opinion on the subject of local option? 


16. (Yule and Kendall) For a certain district in England during 20 years, records were 
kept of the following variables: 

X - spring rainfall in inches 

y « accumulated temperature in ®F above 42®F in spring 
* « seed-hay crop in cwt/acre 

The following results were obtained: 

f 4.91, y • 694, 28.02 

s, « 1.10, %y - 85, s. - 4.42 
fw - 0.80, r„. «• -0.40, r,„ • -0.66 



For 

Against 

Drinkers 

18 

89 

Non-drinkers 

46 

87 







Review Questions and Prouexns m 

Calculate the regreseion equation of hay crop on spring rainfall and accumulated tempera* 
ture. 

Hint, Equation (16.36) can be written 

^ - 5 « 6(a; ^ jg) -f c(y — §) 

By solving the equations (16,37), show that 

b « (6« -- hM/(l - r,/) 

C ** {bgy ““ hxybtr)/(\ Txy^) 

where 6«r « ««/«** “ regression coefficient of z on x, and similarly for the other 6's. 

17. Calculate the three partial correlation coefficients in Exercise 16 and also the multiple 
correlation coefficient of on x and y. 

IS. (Garrett) Given that, for a group of children between the ages of 8 to 14, the ordinary 
coefficients of correlation between intelligence and school achievement, between intelligence 
and age, and between school achievement and age, are 0.80, 0.70, 0.60, respectively. What 
is the correlation coefficient between intelligence and school achievement in children of the 
same age? Ana. 0.67 
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REVIEW QUESTIONS AND PROBLEMS 

1. Define the foUowing terms: statistics, variate, discrete, class interval, class mark, 
j*array of y’s, range, regression line, sample, universe, coefficient of variation, variance. 

2. Define the following terms: statistic, percentile, index number, mean absolute devia- 
tion, rth moment about the mean, standard normal variate, 95% confidence limits, 1 % level 
of significance, null hypothesis, non-parametric statistic, rank correlation coefficient, corre- 
lation ratio. 

S« Name and define five averages. Discuss their advantages and limitations. 

4. What does a ratio chart show that a chart with a uniform scale does not? If you 
wished to plot data so as to secure the effect of a ratio chart, but had no ratio paper avail- 
able, 1 ow would you accomplish the desired result? 

6 . Prove the following: 

(a) The algebraic sum of the deviations of the observations from their mean is sero. 

(b) The second moment about an arbitrary point equals the second moment about the 
mean increased by the square of the distance between the arbitrary point and the mean. 

6 . Define, and explain how to compute, the following quantities for a grouped distribu- 
tion; ft, ft a*. 
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7, equation of the normal curve (a) with mean m and etandard deviation 0 

and (b) with mean 0 and etandard deviation 1. 

State aomc of the proiicrties of this curve. 

t. Give two of Uic formulae for r. Discuss the use or uses of correlation in any problem 
that occurs to you. 

0. Define the correlation ratio. Discuss its use. 

10. The following is a reduced distribution of the breakfast checks at a cafeteria. Find 
£ and'Sx. 


X 

/ 

8-12 

4 

13-17 

8 

18-22 

24 

23-27 

21 

28-32 

15 

33-37 

14 

38-42 

7 

43-47 

4 

48-52 

2 

53-67 

1 


Am. I » 27.2ff, fi, « 9.4^. 

11. Derive the relations which give the third and fourth moments about the mean in 
terms of moments about the origin. Define and 04. What information do they give? 

12. Compute the value of as and of a 1 for the distribution m Exercise 10. 

13. (Walker) An algebra test was given to 400 high school children, of whom 160 were 
boys and 260 were girls. The results were as follows: 

m * 150 = 250 

fi 72.5 £2 73.6 

«i ~ 7.0 S2 “= 6.4 

Find the mean and standard deviation of the combined groups. 

14. For a normal distribution of 1500 students' grades, m = 75, = 10. What values 

of z will include the middle 600 grades? How many grades were below 60; above 90? 

15. Suppose a distribution of 1000 breakfast checks from the cafeteria mentioned in 
problem 10 showed the following results: ^ - 27^, or, * 9^, «« == 0, a4 « 3. On the basis 
of these results what is the expected frequency in the 23-27^ class interval? 

10. Given the following data as to the heights (y) and weights (x) of college men: 

Zv - 6,800, Liy* = 463,025, = 1,022,260 

E* - 16,000, E** - 2,272,600, JV - 100. 

Find £, S, 8x, 8if, r. 

17. Derive the expression for the standard error of estimate, 

«f« «» a„(l — r*)^ 

10* Discuss the use of in predictions. 

10* For Table 71, (a) find the correlation coefficient, (b) find the equations of the lines 
of regression, (c) locate the oofirdinate axes through the arithmetic mean of the table and 
plot tbe fines obtained in (b). 
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TaBUB 7L CoBRBUkTtON TaBUE rob MoKfHLT RaIKPALL IN iNCHKfl AT lOWA ClTf 
AND Dub Moinbb fob 30 CoN»BJCtrnvB Yeaiis 



20 . How does the scatter diagram assist one in deciding whether the regression is linear 
or iion*>liiiear? Give the formulas for the correlation coefficient and for the correlation ratio 
of p on a;, explaining the meaning of the letters used. How would you use these indices 
correlation to decide whether the regression of p on x is linear or non-linear? 

(a) In a normal distribution in which ^ 0 and ax ^ 4, what proporUon of the data 

lie where z > 12? 

(b) If 100 of the observations lie between x — 6 and x «» --Si how many of the data 

are there in the whole distribution? 
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Further Toiucs in Correlation 


XVI 


SI. {$) Expfmd (a 4- 6 -f c 4- 

(b) The expansion of (xi -f xj 4- * * * 4- Xn)^ consists of the sum of the squares of the x's 
l^us the sum of their products taken two at a time. Express this expansion in summation 
notation. 

IS. Given N pairs of variates: (xn, xji); (xm, X 23); {x|j, x*s); ‘ ; (xi«, Xj^). Show that; 

(a) the mean £ of all the variates is 

1 ** 

^ * TTH'r S + ^2*) 

lis 1 

(b) the variance e* taken about the X in (a) is 

^ TTkt •" ^)* 4- Z) "" *)*] 

2N 1 1 

Note, The quantity 

1 ^ 

r' - — IE (lu - iXii, - i) 

Nir 1 

where S and «* arc defined as in (a) and (b) is called the intra-class correlation coefficient. 
For its use see Statietical Methods for Research Workers^ by R. A. Fisher (§38), Oliver and 
Boyd, London, KHh ed., 1946. 

N 

M. Let & - Prove that 5, - N(N + l)/2, 

1 

S 2 * A^(A^ 4- l)(2N 4- 1)/6, Si = 

IS. Sketch the graph of |/ « Ac®' » — ® < it < ® , when (a) both A and B arc positive, 
(b) A is positive and B negative, (c) A is negative and B positive, (d) both A and B 
are negative. 

26. For N correlated values of x and y the regression equation of y on x is found to be 
1/ « 1 4- X. If i « 0, r =« 0.5, and s* = 1, determine y and 

17. Discuss the properties of the normal correlation surface and their use in passing judg** 
men! on the reliabihty of predictions based upon the regression line of y on x. 

28. Show how to fit a parabola by the method of moments. 

21. A correlation coefficient of 0.603 is said to be highly significant. Assuming that this 
refers to the 1% level of significance, what is the least number of pairs of observations that 
must have been made in order to warrant this statement? Ana. 25. 

Hint. Assume a normal distribution of z\ with variance 1/{N — 3). See §15.8. 

30. In Problem 13, is the difference of mean scores between the boys and the girls signi- 
ficant? 

31. Explain the meaning of chi-square (x*) and how it is used as a test of the goodness of 
fit of a theoretical frequency curve to a distribution. 

32. Describe the theoretical Poisson distribution and give examples of actual distributions 
which approximate to it. 

38. What ia the probability of getting a total of either 7 or 11 in a throw with 
two dice? 

34. Two groups of guinea pigs, as similar as possible, were inoculated with a certain 
^ase. One group of 20 was used as a control. The other group of 30 was treated with a 
drug suppoeed to have curative properties. Sixteen of the control group and nine of the 
treated group died within a week. Discuss the significance of this result. 

83. The following table gives the gains in weight (in grams) in a certain period for 10 pain 
of rats, one of each pair being fed on raw peanuts and the other on roasted peanuts, the 
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remainder of the diet being identical for both members of the pair. Discuss whether the 
observed means or the variances are signiiicaatly different. 


Pair No, 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Raw 

61 

60 

66 

63 

56 

63 

59 

66 

44 

61 

Roasted 

65 

64 

47 

69 

61 

61 

67 

54 

62 

58 


S6« Make an analysis of variance of the data in Problem 35. 

Hini. Subtract 60 from each observation to make the numbers easier to deal with. Ihe 
total number of degrees of freedom is 19, of which 9 are between pairs, 1 between diets, and 
9 attributable to error. The sum of squares for the pairs may be calculated like the sum of 
squares for individuals in §16.2, The variations between pairs and between diets both turn 
out to be non-significant. 

37. (Bertrand) The proprietor of a gambling establishment complains to the makers 
of a roulette wheel which he has installed that the wheel seems to favor red, and that patrons 
have noticed it. In 1000 trials of which he has kept a record, red has shown up 615 times, 
black 465 times and white 30 times, the theoretical proportions being 18 : 18 : 1. Would 
you consider the complaint justified? 

38. The records of 1000 birth registrations in a certain area are examined and it is noted 
that 510 are males. What are the 95% confidence limits for the proportion of male births 
in the population of which the 1000 may be considered as a random sample? 

39. It is thought that two physical quantities x and y should be connected by a rdation 
of the form y » ar’‘. The experimental values are 


X 

0.5 

1 6 

2.6 

5.0 

10.0 

y 

3,4 

7.0 

12.8 

29.8 

68.2 


Find the best values of a and n. Hint. Fit a straight line to the values of log y and log x. 
If F is the theoretical value of log Y « log a -f n log x. 

40. In a survey made in Iowa, random samples of housewives, divided into rural and 
urban, were asked whether they had done any canning of fruit and vegetables during the 
previous season. The results are shown in the following table: 



Rural 

Urban 

Done 

367 

274 

Not done 

13 

26 


Does this indicate a significant difference between rural and urban housewives in respect of 
their canning operations? Am, Yes. 

41« A point X is taken at random in a straight line segment AB whose middle point is 0. 
What is the probability that AX, BX, and AO can form a triangle? What fundamental 
aisumption is made in the solution? 

Hint, Any two sides of a triangle are together greater than the third side. If the line AB 
is of length 2a and if AX » x, then for x < a the condition is that o-f»>2a — x. If 
X > a, the condition is that 2a — x + a > x. The conditions are satisfied if the point x 
lies between the mid-points of AO and OB. 
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Suppose it costs one cent to draw each individual of a sample. It is required to 
draw a sample of N from an infinite population in which <r/M =» 0.1 and is to be so laiige 
that the probability that the sample mean differs from the population mean by more than 
0.1 per cent of the latter is less than 0.01. How much will this sample cost? How much 
extra will it cost to double the accuracy (that is, to replace 0.1 per cent by 0.06 per cent)? 

Ans. $670,12010. 

48. Bacterial counts on 16 plates, made with the same dilution of a culture, were as 
fallows: 193, 168, 161, 163, 152, 171, 156, 159, 140, 183, 151, 152, 133, 164, 157. Is the 
variability consistent with what would be expected if the numbers follow a Poisson law? 

Hint, For a Poisson law the variance is equal to the mean. The sampling variance of 
I* in a sample of N is such that N is distributed like x* with AT — 1 degrees of freedom* 

Use the mean of the sample as an estimate of (A and compute x* from the sample variance. 

44 . Show that the least squares condition, ^(Z — «)* min., gives, when Z ^ a + 
bx cyt the normal equations (16.37). 

Hint, Calculus students will differentiate — r)* partially with respect to a, h, 
and c. Others can use the method of §14.9, expressing 23(Z — «)* as a quadratic in either 
a, 6, or c. 
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Tablh I. Ordinates and Areas of the Normal CtmvE, 4 ^( 2 ) 


1 
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z 
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.00014 

.49997 

3.10 

.00327 

.49903 

3 55 

,00073 

.49981 




3.11 

.00317 

.49906 

3.56 

.00071 

.49981 




3,12 

.00307 

.49910 

3,67 

.00068 

.49982 




3,13 

.00298 

.49913 

3.58 

.00066 

.49983 




3.14 

.00288 

.49916 

3.69 

.00063 

.49983 
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Table II . VALtrEs or t CoR»E?ii»oNDiNa to Given Peobabimtius 


IVobability of a deviation greater than t 


ox 

freedom n 

,005 

.01 

.025 

.05 

.1 

.15 

1 

63.657 

31.821 

12 706 

6 314 

3.078 

1.963 

2 

9.925 

6 965 

4 303 

2 920 

1 886 

1.386 

S 

5 841 

4 541 

3 182 

2 353 

1.638 

1.250 

4 

4 604 

3 747 

2 776 

2 132 

1 633 

1.190 

6 

4 032 

3 365 

2.571 

2.015 

1 476 

1.156 

6 

3 707 

3 143 

2.447 

1 943 

1 440 

1.134 

7 

3 499 

2 998 

2 365 

1.895 

1 415 

1.119 

8 

3.355 

2 896 

2 306 

1 860 

1 397 

1.108 

•9 

3.250 

2 821 

2 262 

1.833 

1 383 

1.100 

10 

3.169 

2.764 

2.228 

1 812 

1.372 

1.093 

11 

3 106 

2 718 

2 201 

1 796 

1 363 

1 088 

12 

3 055 

2.681 

2 179 

1 782 

1 356 

1.083 

13 

3 012 

2 650 

2 160 

1 771 

1.350 

1 079 

14 

2 977 

2 624 

2 145 

1 761 

1 345 

1.076 

15 

2 947 

2.602 

2 131 

1 753 

1.341 

1.074 

16 

2 921 

2 583 

2 120 

1 746 

1 337 

1 071 

17 

2 898 

2 567 

2 no 

1 740 

1.333 

1 069 

18 

2 878 

2 552 

2 lOU 

1 734 

1.330 

1.067 

19 

2.861 

2 539 

2 093 

1 729 

1.328 

1.066 

20 

2.845 

2.528 

2 086 

1 725 

1 325 

1.064 

21 

2 831 

2 518 

2 080 

1 721 

1.323 

1.063 

22 

2 819 

2 508 

2.074 

1 717 

1 321 

1.061 

23 

2.807 

2 500 

2 069 

1 714 

1.319 

1.060 

24 

2.797 

2.492 

2 064 

1 711 

1.318 

1.059 

25 

2.787 

2 485 

2 060 

1 708 

1.316 

1.058 

26 

2.779 

2.479 

2 056 

1 706 

1 315 

1.058 

27 

2 771 

2.473 

2 052 

1.703 

1 314 

1 057 

28 

2 763 

2 467 

2 048 

1 701 

1 313 

1.056 

29 

2 756 

2.462 

2 045 

1 699 

1 311 

1.055 

30 

2.750 

2.457 

2.042 

1.697 

1.310 

1.055 

00 

2.576 

2 326 

1.960 

1.645 

1 282 

1.036 


The probability of a deviation nuynericaUy greater than t is twice the 
probability given at the head of the table. 


♦Thia table » reproduced from ’’Statistical Methods for Hesearoh Workers/* with the 
^(^erous ^^rmi^on of the author. Professor R. Fisher, and the publishers, Mtmn* 
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Table II. Values of t Cobbesfonoinq to Given Pbobabiuties (cont.) 


Degrees 

of 

Probability of a deviation greater than t 

freedom n 

.2 

.25 

.3 

.35 

.4 

.45 

1 

1.376 

1 000 

.727 


.325 

.158 

2 

1.061 

.816 

.617 

.445 ' 

.289 

.142 

3 

.978 

.765 

.584 

.424 

.277 

.137 

4 

.941 

.741 

.569 

.414 

.271 

.134 

6 

.920 

.727 

.559 


.267 

.132 

6 

.906 

.718 

.553 


.265 

.131 

7 

.896 

.711 

.549 

.402 

.263 


8 

.889 

.706 

.546 

.399 

.262 

.130 

0 

.883 

.703 

.643 

.398 


.129 

10 

.879 

.700 

.542 

.397 

.260 

.129 

11 

.’876 

.697 

.540 

.396 

.260 

.129 

12 

.873 

.695 

.539 

.395 

.259 

.128 

13 

.870 

.694 

.538 

.394 

.259 

,128 

14 

.868 

.692 

.537 


.258 

.128 

15 

.866 

.691 

.536 

.393 

.258 

.128 

16 

.865 

.690 

.535 

,392 

.258 

.128 

17 

.863 

.689 

.534 

.392 

.257 

.128 

IS 

.862 

.688 

.534 1 

.392 

..257 

.127 

19 

.861 

.688 

.533 

.391 

.257 

.127 

20 

.860 

.687 

.533 

.391 

.257 

.127 

21 

,859 

.686 

.532 

.391 

.257 

.127 

22 

.858 

.686 

.532 

.390 

.256 

.127 

23 

,858 

.685 

.532 

.390 

.256 

.127 

24 

.857 

.685 

.531 


.256 

.127 

25 

.856 

.684 

.531 

.390 

.256 

.127 

26 

.856 

.684 

,531 

.390 

.256 

.127 

27 

.855 

.684 

.531 

.389 

.256 

.127 

28 

,855 

,683 

1 .530 

.389 

.256 

.127 

29 

.854 

.683 

I .530 

.389 

.256 

.127 

30 

.854 

.683 

.530 

.389 

.256 

,127 

oo 

.842 

.674 

1 .524 

.385 

.253 

.126 


The probability of a deviation numerically greater than ( is twice the 
probability given at the head of the table. 
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TABia lit YJOJUm of 3C* COB»»SW>Nl)mG TO GiVJBN FBOBABIWTnai • 


Probability of a deviation greater than 


Degrees 

of 

freedom 

n 


1 

2 

3 

4 

5 

6 

7 

8 
8 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 


.01 


6.635 

9.210 

11.341 

13.277 

15.086 

16.812 
18.475 
20.090 
21 666 
23.209 

24 725 
26 217 
27.688 
29 141 
30.578 

32 000 

33 409 
34.805 
36.191 
37.566 

38.932 

40.289 

41.638 

42.980 

44.314 

45.642 
46.963 
48 278 
49.588 
50.892 


.02 


5.412 
7.824 
9.837 
11.668 
13 388 

15 033 

16 622 
18.168 
19.679 
21.161 

22.618 
24^54 
25.472 
26 873 
28.259 

29.633 

30.995 

32.346 

33.687 

35.020 

36.343 

37.659 

38.968 

40.270 

41.566 

42.856 

44.140 

45.419 

46.693 

47.962 


.05 


3.841 

5.991 

7.815 

9.488 

11.070 

12.592 

14.067 

15.507 

16.919 

18.307 

19.675 

21.026 

22.362 

23.685 

24.996 

26 296 
27.587 
28 869 
30.144 
31.410 

32 671 
33.924 
35.172 
36.415 
37.652 

38,885 

40.113 

41.337 

42.557 

43.773 


.10 


2.706 
4 605 
6.251 
7.779 
9.236 

10.645 
12 017 
13.362 
14.684 
15.987 

17.275 
18 549 

19.812 
21 .064 
22.307 

23.542 

24 769 

25 989 
27.204 
28.412 

29.615 
30 813 
32.007 
33.196 
34.382 

35 563 
36.741 
37.916 
39.087 
40.256 


.20 


1.642 
3.219 

4.642 
5.989 
7.289 

8.558 

9.803 

11.030 

12.242 

13.442 

14.631 

15.812 

16.985 

18.151 

19.311 

20.465 
21.615 
22 760 
23.900 
25.038 

26.171 

27.301 

28.429 

29.553 

30.675 

31.795 

32.912 

34.027 

35.139 

36.250 


.30 


1.074 

2.408 

3.665 

4.878 

6.064 

7.231 

8.383 

9.524 

10.656 

11.781 

12.899 
14.011 
15.119 
16 222 
17.322 

18.418 
19 511 
20.601 
21 689 
22.775 

23 858 
24.939 
26.018 
27.096 
28.172 

29.246 

30.319 

31.391 

32.461 

33.530 


.50 


.455 

1.386 

2.366 

3.357 

4,351 

5.348 

6.346 

7.344 

8.343 

9.342 

10.341 

11.340 

12.340 

13.339 

14.339 

15.338 

16.338 

17.338 

18.338 

19.337 

20.337 

21.337 

22.337 

23.337 

24.337 

25.336 

26.336 

27.336 
28.3 J 6 

29.336 


For larger values of the quantity ( 2 x*)^ — ( 2 ji — 1 )^ may be used as a 
normal deviate with unit standard deviation. 

♦ Thia table is reproduced from *’Statlitical Methods for Heeearch Workers/' with the 
generout permiteion of the author, Profeteor H. A. Fiaher, and the puohiheri» Metare. 
Oliver and Boyd. 
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Table HI, Valujes of x* Co»BaBSFom>iN0 Given Pbobabilitibs {conL) 


Degr<?«58 

of 


Probability of a deviation greater than x* 


freedom 

n 

1 

.70 

.80 

1 

.90 

.95 

.98 

.99 

1 

.148 

.0642 ! 

.0158 

.00393 

.000628 

.000157 

2 

,713 

.446 

211 

.103 i 

.0404 

.0201 

3 

1.424 

1.005 

.584 

.352 

.185 1 

.116 

4 

2 195 

1 649 

1 064 

.711 

.429 

.297 

6 

3.000 

2.343 

1.610 

1.145 

,752 

.554 

6 

3.828 

3.070 

2 204 

1.635 

1.134 

.872 

7 

4.671 

3.822 

2.833 

2 167 

1.564 

1.239 

8 

5.527 

4.594 

3.490 

2.733 

2.032 

1.646 

9 

6.393 

5.380 

4 168 

3 325 

2 532 

2 088 

10 

7 267 

6.179 

4.865 

3.940 

3.059 

2.558 

11 

8.148 

6.989 

5 578 

4 575 

3 609 

3 053 

12 

9.034 

7.807 

6.304 

5 226 

4.178 

3.571 

13 

9.926 

8 634 

7 042 

5 892 

4 765 

4 107 

14 

10.821 

9 467 

7.790 

6 571 

5 368 

4 660 

15 

11 721 

10.307 

8.547 

7.261 

5.985 

5.229 

16 

12 624 

11.132 

9.312 

7.962 

6 614 

5.812 

17 

1 13.531 

12.002 

10.085 

8 672 

7.255 

6.408 

18 

14.440 

12 857 

10 865 

9 390 

7.906 

7.015 

19 

15.352 

13.716 

11.651 

10.117 

8.567 

7.633 

20 

16.266 

14.578 

12.443 

10.851 

9.237 

8.260 

21 

17.182 

15.445 

13.240 

11.591 

9 915 

8.89T 

22 

18.101 

16,314 

14 041 

12.338 

10 600 

9.542 

23 

19.021 

17.187 

14 848 

13 091 

11.293 

10.196 

24 

19.943 

18.062 

15.659 

13 848 

11.992 

10,856 

25 

20.867 

18.940 

16.473 

14.611 

12.697 

11.524 

26 

21.792 

19.820 

17.292 

15.379 

13.409 

12.198 

27 

22.719 

20.703 

18.114 

16.151 

14.125 

12.879 

28 

23.647 

21.588 

18.939 

16.928 

14.847 

13.565 

29 

24.577 

22.475 

19.768 

17.708 

15.574 

14.256 

Hi 

25.508 

23.364 

20.599 

18.493 

16.306 

14.953 


For larger values of n, the quantity (2x*)^ — (2n — 1)^ may be psed aa 
a normal deviate with unit standard deviation. 
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Tabl£ IV.^ 5% (Roman Ttpe) and V/c (Boij» Face Type) Points fob thb Distbib0TIon of F 
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♦ Reprodueed from ^aiiadieal hftihod* by G W. Snedet^r hy pern i^ion of t'je author and th« publisher, Collegiate Fre»t Inc., Araoi, Iowa. 





Tabl£ IV. 5% (Roman Type) and (Bold Face Type) Points for the DisriuBt'Tiox or 
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Table V, Rakbom Sampling Numbbbs* 




5-8 
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FintTbounnd 
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3r*40 
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»3 *5 

75 48 

59 01 

83 7a 

59 93 

76 24 

97 08 

86 95 
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67 44 
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55 50 

43 10 

53 74 

35 08 

90 61 

18 37 
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• Reproduced with the permission of Professor E. S. Pearson from M. G, Kendall and 
B. Babington Smith, Tables of Fandom Sampling Numbers (Tracts for Computers, No. 24)> 
Cambridge XJniv. Press. 
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Tabxjb V. EAm>OM Samj»lin<i Ntjmbbbs (a>ni.) 
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82 83 

93 51 

48 56 

54 10 

7 a 3 * 

13 

83 73 

52 25 

99 97 

97 78 

12 48 

36 83 

89 95 

60 32 

41 06 

76 14 

14 

08 59 

52 18 

26 54 

65 50 

82 04 

87 99 

ox 70 

33 56 

25 80 

tS 3 84 

^5 

41 87 

3a 7 « 

49 44 

29 36 

94 58 

16 82 

86 39 

62 15 

86 43 

54 31 

i6 

00 47 

37 59 

08 56 

23 81 

22 42 

72 63 

17 63 

14 47 

25 20 

63 47 


86 13 

IS 37 

89 81 

38 30 

78 68 

89 13 

29 61 

82 07 

GO 98 

64 3a 

i8 

33 84 

97 83 

59 04 

40 20 

35 86 

03 17 

68 86 

63 08 

01 82 

as 46 

19 

61 87 

04 16 

57 07 

46 80 

86 12 

98 08 

39 73 

49 20 

77 54 

50 91 

20 

43 89 

86 59 

23 25 

07 88 

61 29 

78 49 

19 76 

53 91 

50 08 

07 86 

21 

29 93 

93 91 

23 04 

54 84 

59 85 

60 95 

20 66 

41 28 

7a 64 

64 73 

22 

38 50 

58 55 

55 14 

38 8s 

50 77 

18 65 

79 48 

87 67 

83 17 

08 19 

^3 

31 82 

43 84 

31 67 

12 52 

55 11 

72 04 

41 15 

6a 53 

*7 98 

22 68 

24 

91 43 

00 37 

67 13 

56 ll 

55 97 

06 7 S 

09 25 

52 02 

39 13 

87 53 

^5 

3863 

S6 89 

76 85 

49 89 

75 a6 

96 45 

80 38 

05 04 

1 1 66 

35 14 


7-4 

S-S 

g-i2 

Fourth Thousand 

Z7-20 21-24 

25-28 

29-37 

33-36 

37-40 

z 

02 49 

05 41 

22 27 

94 43 

93 64 

04 23 

07 20 

74 II 

67 95 

40 82 

2 

X I 96 

73 64 

69 60 

62 78 

37 ox 

09 25 

33 02 

08 ox 

38 53 

74 82 

3 

48 85 

68 34 

65 49 

69 92 

40 79 

OS 40 

33 51 

54 39 

6x 30 

31 36 

4 

87 84 

67 30 

80 21 

48 X 2 

35 36 

04 88 

18 99 

77 49 

48 49 

30 71 

5 

3 * S 3 

27 7 * 

65 72 

43 07 

07 22 

86 52 

91 84 

57 92 

65 7 x 

00 X I 

6 

66 75 

79 89 

55 92 

37 59 

34 31 

43 20 

45 58 

25 45 

44 38 

9a 65 

7 

1X26 

63 45 

45 76 

50 59 

77 46 

34 66 

82 69 

99 a 6 

74 29 

75 16 

8 

17 87 

83 91 

42 45 

56 18 

01 46 

93 13 

74 89 

*4 64 

25 75 

92 84 

9 

62 56 

13 03 

65 03 

40 81 

47 54 

51 79 

80 8x 

33 61 

ox 09 

77 30 

10 

62 79 

63 07 

79 35 

49 77 

os 01 

30 10 

50 81 

33 00 

99 79 

19 70 

II 

75 51 

02 X 7 

71 04 

33 93 

36 60 

42 75 

76 22 

83 87 

56 54 

84 68 

12 

87 43 

90 16 

91 63 

51 72 

65 90 

44 43 

70 72 

17 98 

70 63 

90 32 

^3 

97 74 

20 26 

21 10 

74 87 

88 03 

38 33 

76 52 

26 92 

14 95 

90 51 


98 81 

10 60 

01 2X 

57 10 

28 75 

2 X 82 

88 39 

12 85 

18 86 

it 24 

^5 

5X 26 

40 x8 

5a 64 

60 79 

25 53 

29 00 

42 66 

95 78 

5836 

29 98 

z6 

40 23 

99 33 

76 10 

41 96 

86 10 

49 12 

00 29 

4X 80 

03 59 

93 X 7 

77 

26 93 

6s 91 

86 51 

66 72 

76 45 

46 38 

94 48 

81 94 

19 06 

66 47 

z8 

88 50 

81 17 

16 98 

29 94 

09 74 

42 39 

46 22 

00 69 

09 48 

x6 46 

79 

63 49 

93 80 

93 25 

59 36 

19 95 

79 86 

78 05 

69 01 

02 33 

83 74 

20 

36 37 

98 12 

06 03 

31 77 

87 10 

73 8a 

83 10 

83 60 

50 94 

40 91 

21 

93 80 

12 23 

22 47 

47 95 

70 17 

59 33 

43 06 

47 43 

06 12 

66 60 

22 

*9 85 

68 71 

20 56 

31 IS 

00 53 

as 36 

58 12 

65 22 

41 40 

24 31 

2$ 

97 7 a 

08 79 

3X 88 

26 51 

30 50 

71 01 

71 51 

77 06 

95 79 

29 X 9 

^4 

85 as 

70 91 

OS 74 

60 14 

63 77 

S 9 93 

8x 56 

47 34 

17 79 

a 7 53 


75 74 

67 5 » 

68 3x 

7a 79 

57 73 

7a 36 

48 73 

*4 36 

87 90 

68 02 
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Tabuj V. Ranx>om SAMPLTPra Numbeks (ctmt*) 



1^4 

5-^ 

9“X2 

Fifth Thousand 

13^x6 17-20 21-24 

23-28 

20-32 

33-36 

37-40 

t 

29 93 

so 69 

7t 63 

17 55 

25 79 

10 47 

88 93 

79 61 

42 82 

13 63 

a 

15 M 

40 71 

26 51 

89 07 

77 87 

75 51 

01 31 

03 4a 

94 24 

8x IX 

3 

03 87 

04 zz 

25 10 

58 98 

76 29 

22 03 

99 41 

24 38- 

12 76 

50 22 

4 

79 39 

03 91 

88 40 

75 64 

52 69 

65 95 

92 06 

40 14 

aS 42 

29 60 

5 

30 03 

so 69 

15 79 

19 6s 

44 a 8 

64 81 

95 23 

14 48 

72 i8 

IS 94 

6 

29 03 

99 98 

61 28 

75 97 

98 02 

68 S 3 

13 91 

98 38 

13 72 

43 73 

7 

78 19 

60 81 

08 24 

lo 74 

97 77 

09 59 

94 35 

69 84 

82 09 

49 56 

8 

15 84 

78 54 

93 91 

44 29 

13 51 

80 13 

07 37 

52 21 

53 91 

09 86 

9 

36 61 

46 22 

48 49 

19 49 

72 09 

92 s8 

79 20 

53 41 

02 18 

00 64 

xo 

40 54 

9 S 48 

84 91 

46 54 

38 62 

35 54 

14 44 

66 88 

89 47 

41 80 

XI 

40 87 

80 89 

97 14 

28 60 

99 8i 

90 30 

87 80 

07 SI 

58 71 

66 58 

12 

xo 22 

94 92 

82 41 

17 33 

14 68 

59 45 

SI 87 

56 08 

90 80 

66 60 

^3 

15 9 * 

87 67 

87 30 

62 42 

59 *8 

44 12 

42 50 

88 31 

13 77 

16 14 

^4 

13 40 

31 87 

96 49 

90 99 

44 04 

64 97 

94 14 

62 18 

IS 59 

83 35 

15 

66 52 

39 45 

96 74 

go 89 

02 71 

10 00 

99 86 

48 17 

64 06 

89 09 

i6 

91 66 

S 3 64 

69 68 

34 31 

78 70 

25 97 

50 46 

62 21 

27 25 

06 20 

17 

67 41 

58 75 

15 08 

20 77 

37 29 

73 20 

15 75 

93 96 

91 76 

96 99 

i8 

76 52 

79 69 

96 23 

72 43 

34 48 

63 39 

23 23 

94 60 

88 79 

06 17 

^9 

19 81 

54 77 

89 74 

34 81 

71 47 

10 95 

43 43 

55 81 

19 45 

44 07 

20 

25 59 

25 35 

87 76 

38 47 

as 75 

84 34 

76 89 

18 05 

73 95 

72 22 

ax 

55 90 

24 55 

39 63 

64 63 

16 09 

95 99 

98 28 

87 40 

66 66 

66 92 

22 

02 47 

05 83 

76 79 

79 42 

24 82 

42 42 

39 61 

6a 47 

49 II 

7a 64 

23 

18 63 

05 32 

63 13 

31 99 

76 19 

35 8s 

91 23 

50 14 

63 a8 

86 59 

24 

89 67 

33 82 

30 16 

06 39 

20 07 

59 50 

33 84 

02 76 

45 03 

33 33 

2% 

62 98 

66 73 

64 06 

59 5 * 

74 27 

84 62 

31 45 

65 8a 

86 05 

73 00 



5-3 

9 -J 2 

Sixth Thousand 

13-16 xy-ao 21-24 

25-28 

20-32 

33-33 

37-40 

x 

27 50 

13 05 

46 34 

63 85 

87 60 

35 55 

05 67 

88 15 

47 00 

50 92 

2 

02 31 

57 57 

62 98 

41 09 

66 Qi 

69 88 

92 83 

35 70 

76 59 

02 58 

3 

37 43 

12 83 

66 39 

77 33 

63 26 

53 99 

48 65 

23 06 

94 29 

S 3 04 

1 ^ 

83 56 

65 54 

19 33 

35 42 

92 12 

37 14 

70 75 

18 58 

98 57 

12 52 

5 

06 81 

56 27 

49 32 

12 42 

92 42 

05 96 

82 94 

70 25 

45 49 

x8 16 

6 

39 15 

03 60 

*5 56 

73 16 

48 74 

50 27 

43 42 

5836 

73 16 

39 90 

1 7 

84 4S 

71 93 

10 27 

IS 83 

84 20 

57 42 

41 28 

42 06 

15 90 

70 47 

8 

82 47 

05 77 

06 89 

47 13 

92 8s 

60 12 

32 89 

25 22 

4» 38 

87 37 

9 

98 04 

06 70 

24 21 

69 02 

6s 4a 

55 33 

11 95 

72 35 

73 23 

57 *6 

xo 

18 33 

49 04 

14 33 

48 50 

IS 64 

58 26 

14 9* 

46 02 

72 13 

^48 62 

XI 

33 93 

19 93 

38 27 

43 40 

27 7a 

79 74 

86 57 

41 83 

58 71 

56 99 

12 

48 66 

74 30 

44 81 

06 80 

29 09 

50 31 

69 61 

24 64 

28 89 

97 79 

13 

8S 8s 

07 54 

21 50 

31 80 

10 19 

56 6s 

82 52 

26 58 

55 12 

26 34 

^4 

oS 27 

08 08 

35 87 

96 S 7 

33 12 

01 77 

52 76 

09 89 

71 12 

17 69 

Jr 5 

$9 61 

22 14 

26 09 

96 75 

17 94 

51 08 

41 91 

45 94 

80 48 

59 92 

j 6 

J 7 41 

77 79 

31 66 

36 54 

9a 85 

65 60 

53 98 

63 SO 

XI 20 

96 63 

^7 

II 26 

37 08 

07 71 

95 95 

39 75 

9a 48 

99 78 

23 33 

19 56 

06 67 

xS 

48 08 

13 98 

x6 52 

41 15 

73 96 

32 55 

03 xa 

38 30 

88 77 

17 03 

^9 

76 27 

72 22 

99 61 

7a 15 

00 25 

21 54 

47 79 

18 41 

58 so 

57 66 

20 

98 89 

22 25 

79 92 

53 55 

07 98 

66 71 

S 3 29 

61 71 

56 96 

41 78 

21 

88 69 

6x 63 

01 67 

61 88 

58 79 

35 6 s 

08 45 

63 38 

69 86 

79 47 

22 

12 58 

*3 75 

80 98 

01 35 

91 x6 

18 36 

90 54 

99 17 

68 36 

85 06 

23 

08 86 

96 36 

14 09 

43 8 s 

51 20 

65 18 

06 40 

52 17 

48 10 

68 97 

34 

33 81 

05 51 

32 48 

60 12 

32 44 

08 12 

89 00 

98 82 

79 17 

97 22 

35 

os IS 

99 28 

87 15 

07 08 

66 92 

S 3 81 

69 4a 

02 27 

65 33 

57 69 



T#bx 4 B V, Eanx>om Bamplino NxmBBBs (coni.) 

8ev«fiili Thotitaiui 

1-^4 5^ IJ^X6 X7-20 25-25 2^52 57-^0 

j 8030 23^4 6796 2133 3690 0391 69^33 9013 3448 0219 

2 6129 8961 3208 1262 2608 4200 3173 3130 3061 34 n 

S 23 33 6101 0221 1181 513a 3610 2374 5031 9011 7352 

4 9421 3292 9350 7267 2320 7459 3030 4866 7532 27 97 

5 87 6t 92 69 01 60 28 79 74 76 86 06 39 29 73 8s 03 27 50 57 

^ 3758 19*8 0342 8603 8574 4481 8645 7116 1352 3556 

7 6486 6631 5504 8840 1030 8438 0613 5883 6204 635a 

8 2269 5845 4923 0981 9884 0504 7599 2770 7279 3219 

9 2322 1422 6490 1026 7423 5391 2773 7819 9243 6810 

xo 4238 5964 7296 4657 8967 2281 9456 6984 1831 0639 

li 1718 0134 1098 3748 9386 8859 6953 7886 3726 8548 

J2 3945 6953 9489 5897 2933 2919 5094 8057 3199 3891 

X3 4318 114a 5619 4844 45 02 8429 0178 6577 7684 8885 

X4 5944 064s 6855 1665 6613 3800 9576 5067 6765 1883 

X5 0150 3432 3800 37 57 4782 6659 1950 8714 35 59 79 47 

x6 7914 6035 4795 9071 3103 8537 3870 3416 6455 6649 

xy ox 56 63 68 80 26 14 97 23 88 59 22 82 39 70 83 48 34 46 48 

x8 2576 1871 2925 1551 9296 0101 2818 0335 ix 10 2784 

xg 2352 1083 4506 4985 35 45 8408 8113 5257 2123 6702 

20 9x 64 08 64 25 74 x6 10 97 31 10 27 24 48 89 06 42 8x 29 10 

21 80 86 07 27 26 70 08 65 85 20 31 23 28 99 39 63 32 03 71 9x 

22 3x71 3760 9560 94 95 54 45 2797 0367 3054 8604 1241 

23 0583 5036 0904 3915 6655 8036 3971 2410 6222 2153 

24 98 70 02 90 30 63 62 59 26 04 97 20 00 91 28 80 40 23 09 91 

25 8279 35 45 6453 9324 8655 4872 1857 0579 20 09 3146 

Eighth ThouMnd 

x-4 5-5 9-/2 1^x6 17-20 21-24 25-28 2g-32 33-36 37-40 

X 37 52 49 55 40 65 27 6x 08 59 91 23 26 18 95 04 98 20 99 52 

2 4816 6965 6902 0883 0883 6837 0096 13 59 12 x6 1793 

3 5043 0659 5653 30 6x 4021 2906 4960 9038 2143 1925 

4 8931 6279 45 73 7172 77 11 2880 7235 75 77 2472 9843 

5 6329 9061 8639 0738 3885 7706 xo 23 3084 0795 3076 

8 7168 9394 0872 3627 8589 4059 8337 9385 7397 840s 

7 0506 9663 5824 0595 5664 7753 8564 159s 9391 5903 

8 0335 5895 4644 2570 3166 0x05 44 44 6291 3631 4504 

9 1304 5767 74 77 53 35 93 5* 8283 2738 6316 0448 7523 

XO 4996 43 94 5604 0279 5578 0144 7526 8554 ox 8x 3282 

XX 2436 2408 4477 5707 54 4 * 0456 0944 3058 2545 3756 

^2 55*9 9720 01 II 4745 79 79 0672 1281 8697 5409 0653 

xj 0228 5460 2835 3294 3674 5*63 9690 04x3 3043 1014 

14 9050 1378 2220 3756 97 95 49 95 9**5 52 73 *293 7894 

X5 33 7* 3243 2958 4738 3998 6751 6447 49 9* 6458 9507 

x6 7058 2849 54 32 9770 2781 6469 7*52 0256 6137 0458 

Xf 0968 96x0 5778 8500 8981 9830 1940 7628 6299 9983 

x5 19 36 60 85 35 04 12 87 83 88 66 54 32 00 30 20 05 30 42 63 

X9 0475 44 49 6426 5*46 8050 53 91 0055 6736 68 66 0829 

20 7983 3239 4677 5^83 4221 6003 1447 07 ox 6685 49 22 

2X 80 99 42 43 08 58 54 4* 98 05 54 39 34 42 97 47 3® 35 59 40 

22 48 83 64 99 86 94 48 78 79 20 62 23 56 45 92 65 56 36 83 oa 

23 2845 3585 2220 13 ox 7396 7005 8450 6859 9658 1663 

24 5207 63x5 8230 6623 1426 6661 1780 4197 4027 2480 

25 39 14 52 t8 35 87 48 55 48 81 03 II 26 99 03 80 08 86 50 42 
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Tabi;b VI. Vxvm&B ot takh 


u* 

0 

1 

a 

s 

4 

6 

6 

7 

8 

8 

M 

.0000 

0010 

0020 

0030 

0040 

0050 

0060 

0070 

0080 

0090 

m 

.0100 

0110 

0120 

0130 

0140 

0150 

0160 

0170 

0180 

0190 

m 

.0300 

0210 

0220 

0230 

0240 

0250 

0260 

0270 

0280 

0290 

m 

.0900 

0310 

0320 

0330 

0340 

0350 

0360 

0370 

0380 

0390 

M 

.0400 

0410 

0420 

0430 

0440 

0450 

0460 

0470 

0480 

0490 


.0500 

0510 

0520 

0530 

0539 

0549 

0559 

0569 

0579 

0589 

m 

.0590 

0609 

0619 

0629 

0639 

0649 

0659 

0669 

0679 

0689 


.0699 

0709 

0719 

0729 

0739 

0749 

0759 

0768 

0778 

0788 

m 

.0798 

0808 

0818 

0828 

0838 

0848 

0858 

0868 

0678 

0888 

•ee 

.0898 

0907 

0917 

0927 

0937 

0947 

0957 

0967 

0977 

0987 

•li 

.0997 

1007 

1016 

1026 

1036 

1046 

1056 

1066 

1076 

1086 

M 

.1096 

1105 

1115 

1125 

1135 

1145 

1155 

1165 

1175 

1184 

At 

.1194 

1204 

1214 

1224 

1234 

1244 

1253 

1263 

1278 

1283 

.u 

.1293 

1303 

1312 

1322 

1332 

1342 

1352 

1361 

1371 

1381 

•u 

.1391 

1401 

1411 

1420 

1430 

1440 

1450 

1460 

1460 

1479 

as 

.1489 

1499 

1508 

1518 

1528 

1638 

1547 

1557 

1567 

1577 

as 

.1586 

1596 

1606 

1616 

1625 

1635 

1645 

1655 

1664 

1674 

ai 

.1684 

1694 

1703 

1713 

1723 

1732 

1742 

1752 

1761 

1771 

.18 

.1781 

1790 

1800 

1810 

1820 

1829 

1839 

1849 

1858 

1868 

as 

.1877 

1887 

1897 

1906 

1916 

1926 

1935 

1045 

1955 

1964 

•ss 

.1974 

1983 

1993 

2003 

2012 

2022 

2031 

2041 

2051 

2060 

ai 

.2070 

2079 

2089 

2098 

2108 

2117 

2127 

2137 

2146 

2156 

J9t 

.2165 

2175 

2184 

2104 

2203 

2213 

2222 

2232 

2241 

2251 

as 

.2260 

2270 

2279 

2289 

2298 

2308 

2317 

2327 

2336 

2346 

.34 

.2355 

2364 

2374 

2383 

2393 

2402 

2413 

2421 

2430 

2440 

.38 

.2449 

2459 

2468 

2477 

2487 

2496 

2506 

2515 

2524 

2534 

.36 

.2543 

2552 

2562 

2571 

2580 

2690 

2599 

2608 

2618 

2627 

.37 

.2636 

2640 

2655 

2664 

2673 

2683 

2692 

2701 

2711 

2720 

.38 

.2729 

2738 

2748 

2767 

2766 

2775 

2784 

2794 

2803 

2812 

.33 

.2821 

2831 

2840 

2849 

2858 

2867 

2876 

2680 

2895 

2904 

as 

.2913 

2922 

2931 

2941 

2950 

2959 

2968 

2977 

aose 

2995 

.Si 

.3004 

3013 

3023 

3032 

3041 

3050 

3059 

3068 

8077 

8086 

.S3 

.3095 

3104 

3113 

3122 

3131 

3140 

3149 

3158 

3167 

8176 

as 

.3185 

3194 

3203 

3212 

3221 

8230 

3239 

3248 

8257 

8266 

.84 

.3275 

3284 

3293 

3302 

3310 

3319 

3328 

3337 

8346 

8355 

as 

.8364 

3373 

3381 

8390 

3399 

3408 

8417 

3426 

8435 

8448 

as 

.3452 

3461 

3470 

3479 

3487 

8496 

8505 

8514 

3522 

3531 

a7 

.3540 

3549 

3557 

3566 

8575 

3584 

8502 

3601 

3610 

3618 

•88 

.8027 

3630 

3644 

3653 

8662 

3670 

3679 

8688 

8696 

3705 

at 

.3714 

3722 

3731 

3739 

3748 

3757 

3765 

8774 

8782 

8791 

.43 

.3799 

3808 

3817 

8825 

3834 

3843 

3851 

8859 

3868 

3878 

.41 

.3885 

3893 

3902 

3910 

3919 

8927 

8936 

8944 

8952 

3961 


.8969 

3978 

3986 

3995 

4903 

4011 

4020 

4028 

4036 

4045 

as 

.4053 

4062 

4070 

4078 

4087 

4095 

4103 

4112 

4120 

4128 

as 

.4136 

4145 

4153 

4161 

4170 

4178 

4186 

4194 

4203 

4211 

as 

.4219 

4227 

4235 

4244 

4252 

4260 

4268 

4270 

4285 

4293 


.4301 

4309 

4317 

4325 

4333 

4342 

4350 

4358 

4366 

4374 

.47 

.4383 

4390 

4398 

4406 

4414 

1 4422 

4430 

4438 

4446 

4454 

as 

.4462 

4470 

4478 

4486 

4494 

4502 

4510 

4518 

4526 

4534 

■a 

.4542 

4550 

4558 

4566 

4574 

4582 

4500 

4508 

4605 

4613 


Reproduced, by permission of the author, from Numerical Tables by J. W. Campbell 
CBepartment of Mathematics, University of Alberta, Canada). 
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TABtm VI. VmiUes of tank «' (ctmi) 


m 

l■ai 

i 

i 

t 

4 

s 

8 

T 

• 

• 

M 

.4811 

4829 

4887 

4841 

4U* 

4880 

4668 

4878 

4684 

4882 

M 

.4888 

4707 

4715 

4728 

4731 

4738 

4748 

4714 

4783 

4788 

jm 

Am 

4781 

4788 

4800 

4808 

4811 

4823 

4831 

4830 

4848 

M 

.4864 

4881 

4888 

4877 

4884 

4888 

4000 

4007 

4011 

4822 

M 

.4910 
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ChaBT I CONTOICNCB LIMITS (95%) FOB THE BlNOMIAL DISTRIBUTION* 



SCALE OF ^ 

* This chart is reproduced with the permission of Professor E S Pearson from Cloppcr, 
G. J., and Pearson, E. S., “The use of Confidence or Fiducial Limits illustrated m the case 
of the Binomial/^ Btometnkaf vol. 26 (1934), p. 404. 
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CtUMT It 00NTO>»NC!B LIMITS (96%) FOR THE COKBBLATION COEFFICIENT^ 



Scale of r ( « SftiDple Comlatioa Coefficient) 
Tha aumlMn on tb« ettrma mdte«t« lampk uu 


* This chart is reproduced with the permission of Professor E. S Pearson from David, 
F. N., Tables of the Correlation Coeficient The Biometnka Office, London. 
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Additiimal AaswttS 


Page 19 

1. (a) 2, (b) 4; 2. (a) 10, (b) 34.5, 44.5, etc., (c) 39.5, 39.5, etc. 

Page 31 
1. ife. 

Page 41 

1. Qj = 136.9 lb., = 126.0 lb., Q, - 148.8 lb., Q = 11.4 lb.; 2. $5886; 6. 
D = 128.6 lb., P = 143.8 lb.; 7. 13.2, 99.4. 

Page 61 

4. 0; 9. 53; 11. 2.80in.; 13. 68%; 14. $5218; 16. 42.2; 16. 73; 17. 1.72%; 

18. 6 X 10; 19. 4.49%; 26. 8.000 > 6.423 > 4.520; 28. (a) 3.1 min. per 

problem, (b) 21.3 problems per hr. 

Page 87 

1. (a) 150 lbs., (b) 44; 4. 1.05%, 0.77; 6. 477.3 sec.; 145.7 sec; 10. 0.2135: 

0. 2659; 11. S = 6, = 2, M.A.D. = 1.5; 13. x = 32, = 5; 16. 5-11, 

s* = 10.4; 17. f = 57.1, s = 8.75; 18. 2 = 13, s » 7.21; 22. 2i =■ 55.7 in., 
«i = 1.18 in. 

Page 106 

1. (a) m .1 - 2.81, mj = 1.46, ms = -0.495, (b) Wj' - —0.5, mj « 1.58, mj «• 0. 
Page ISl 

8 . 25.00, 16.13, 6.72, 1.80; 10. (a) Qs = 22.36, (b) 17.64 to 22.36, (c) 24.49; 
11. 63; 13. 30, 39, 45, 49, 52, 55, 68, 61, 64, 67, 70, 73, 76, 80, 86, 95; 18. f » 
257.13 exp [(s - 69.94)719.2521; 17. (a) 0.06%, (b) 10.5%; 18. 793, 4207; 

19. (a) 13,760, (b) 23,350, (c) 103.6%, (d) 7.42%; 20. (e) 21, 341, 780, (f) 

X « 43.3, 52.5, 56.7; 21. (b) 2314, (c) 2.28%, (d) 3.372; 22. 83, 62, 40, 17. 

Page I 4 O 

9. 15; 11. 360; 21. 231; 82. m “ 0, (r» - J, 41. i 
Page 168 

8. (b) 0.2143, 0.1786; 9. m - 5, o’ “ 1.826, a, - 0.183; 10, 0.00620, 0.00196; 
11. (a) 0.49, (b) values of fc are 6.8, 26.1, 37.4, 24.0, 5.8; 16. 0.487 (0.72)*/xl; 
16. /. - 58.2, 29.1, 7.3, 1.4; 17. /, - 108.7, 66.3, 20.2, 4.1, 0.7. 


Page 173 

1. 0.919, 0.198; 3. No, P - 0.084 ; 4. No, P (one-sided) - 0.186; 6, 8.8% to 
17.4%; 6. No, (pj - Pi)/(est. of s.d.) - 19.1; 7. 0.638 to 0.755; 8. No, 
s - 1.2; 9. 0.41 to 0.77; 10. No, s - 2.42; 11. 0.043 to 0.285; 12. s - 1.60, 
null hypothesis not rejected. 
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IH 

$• 12.27 to 12.39 sec.; 3 . 52 to 59; 7 . * « 2.60, hypothesis rejected at 1% level; 
8. 3.007 to 3.090. Slight bias to the right; 9. 0.84 to 2.26; 11. (a) no, (b) yes. 
Yields more uniform in (b); 12. No; 14. Homogeneity not rejected (P > 0.1); 
16. F « 1.95 with 4 and 16 d.f.; 16. Limits for the five samples are 25.9 to 
58.1, 32.8 to 71.2, 56.8 to 74.2, 36.6 to 67.0, 42.8 to 104.2, for combined sample 
31.9 to 82.0. M/c « 4.12, no reason to doubt homogeneity; 17. No, t L09; 
18. 5.57 to 7.43, 5,08 to 7.92. 

Page 216 

1. P 0.15 or 0.46; 2. P « 0.6, P = 0.85; 3. P = 0.38; 5. No, (pooled) - 

5.44, P « 0.25; 6. Yes, P « 0.016; 10. R « 12.5, a « 4.06, 99% limits 

0.40 to 1.68. 

Page 249 

2. (a) a; -f 21/ « 14, y « 3x -f 3; 3. 4x + Sy « 23; 4. 2, 1; 7. = 1.12; 

11. F « 397 + 41.4 (i - 1920); 16. 2v = i*. 

Page 282 

4, « 0, JLd? « 93.7; 6. 5e/ - 0.469; 7. J « 125, y - 80, « 15, 

« 9, r « 0.55; 9. (a) 5 = 150, y = 70, s, = 15, Sy - 6, r « 0.25, Y - O.lx + 
55, X « 0.6252/ + 106.25, (b) 71 in., 151 lb.; 11. (a) 0.044, (b) 0.36, (c) 0.74; 
13. (c) 0.866, (d) 0.943; 14. No; 18. (a) yes, (b) no (P * 0.17), (c) no 

(P « 0.33); 19. 0,62 to 1.15; 20. 0 66 to 0.94; 21. (a) 20, 1575, 94.5, (b) 18, 
1596, 88.7; 24. 2.37; 27. Y = 0.177x + 54.8, « 16.2, aa indeterminate. 

Page 808 

2. ss between students « 16229 (27 d.f.), s.s. between tests = 8825 (1 d.f,), 
s.s. for error « 4982 (27 d.f.); 6. 0,636; 7. 0.64, agreement not significant; 

11. Both non-significant; 12. Py* ~ Exy = 0.929, r == 0.927, both non-signifi- 
cant; 13. X* = 76, (7 == 0.41; 14. F - 1.35x + 46.8, = 0.482, Py/ = 0.584, 

non-significant; 16. yes, x^ ** 6.5; 16. Z « 3.41x -h 0.00362/ + 9.1; 17. 

T»yg » —0.44, fy,., « 0.10, r„.y = 0.76, r„*y =* 0.80. 

Page 814 

12. 08 « 0.52, 04 « 2.98; 13. 5 - 73.2, s = 6.7; 14. 70.7 to 79.3, 100, 100; 

16. 214; 18. 2 * 150, y ^ 68, « 15, Sy « 2.5, r * 0.6; 21. (a) 0.135%, 

(b) 2270; 26. y - 1, s,y » 1.73; 30. No, z « 1.60; 33. |; 34. P (one- 

sided) « 0.0003, highly significant; 36. For means, t *= 0.89 (non-significant), 
for variances, P «= 1.48 (non-significant); 36. s.s. between pairs =* 223, 8.s. 
between diets « 22, s.s. (or error « 247.4; 37. No, x* “ 4.04; 38. 0.479 to 
0.541; 39. a « 6.72, n « 1.018; 48. yes, P « 0.13. 
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